apify-link-prospecting-outreach
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLink Prospecting Outreach
链接潜在合作方挖掘与外展
Turn a goal + a target keyword + a URL the user wants to promote into a tiered, ready-to-send outreach list: SERP-ranking prospects with Ahrefs-scored authority, the strongest pitch angle per prospect, an outreach-type-matched email draft, and a copy-paste-ready link placement.
将目标+目标关键词+用户想要推广的URL转化为分层、可直接发送的外展列表:包含经Ahrefs评分的SERP排名潜在合作方、每个潜在合作方的最佳推广切入点、适配推广类型的邮件草稿,以及可直接复制粘贴的链接放置方案。
Prerequisites
前置条件
(No need to check it upfront)
- file with
.envAPIFY_TOKEN - Ahrefs MCP available (the skill calls tools for prospect scoring)
mcp__claude_ai_Ahrefs__* - Node.js 20.6+ (for native support)
--env-file - One-time setup inside the skill's folder:
scripts/npm install
(无需提前检查)
- 包含的
APIFY_TOKEN文件.env - 可使用Ahrefs MCP(本技能会调用工具为潜在合作方评分)
mcp__claude_ai_Ahrefs__* - Node.js 20.6+(支持原生功能)
--env-file - 在技能的文件夹中完成一次性设置:
scripts/npm install
Helper scripts (one config, four steps)
辅助脚本(一个配置,四个步骤)
After Step 1–2 inputs are collected, write them to a single (schema in ). Every downstream script reads , so the agent doesn't fork per-campaign copies. Sequence:
campaign.jsoncampaign.json.example--config campaign.jsonbash
undefined收集完步骤1-2的输入后,将其写入单个****文件(示例 schema 见 )。所有下游脚本均读取,因此无需为每个活动复制配置。流程如下:
campaign.jsoncampaign.json.example--config campaign.jsonbash
undefined1. Run the Actor (writes {base}.json + sub-Actor sidecars when --fetch-sub-datasets)
1. 运行Actor(当使用--fetch-sub-datasets时,会写入{base}.json以及子Actor的附属文件)
node --env-file=.env scripts/run_actor.js --actor "apify/link-prospecting-tool" --input '<json>' --timeout 1800 --fetch-sub-datasets --output {base}.json --format json
node --env-file=.env scripts/run_actor.js --actor "apify/link-prospecting-tool" --input '<json>' --timeout 1800 --fetch-sub-datasets --output {base}.json --format json
2. Build unified prospect table from the sidecars
2. 从附属文件构建统一的潜在合作方表格
python3 scripts/build_prospects.py --config campaign.json
python3 scripts/build_prospects.py --config campaign.json
3. (After Step 5 Ahrefs MCP calls → save to {base}_ahrefs_domain.json + {base}_ahrefs_page.json)
3.(步骤5调用Ahrefs MCP后 → 保存至{base}_ahrefs_domain.json + {base}_ahrefs_page.json)
python3 scripts/enrich_prospects.py --config campaign.json
python3 scripts/enrich_prospects.py --config campaign.json
4. (After Step 8 sub-agents write outputs to /tmp/placement_outputs/row_*.json)
4.(步骤8子代理将输出写入/tmp/placement_outputs/row_*.json后)
python3 scripts/merge_subagent_outputs.py --config campaign.json --outputs-dir /tmp/placement_outputs
python3 scripts/merge_subagent_outputs.py --config campaign.json --outputs-dir /tmp/placement_outputs
5. Write the final xlsx + metadata sidecar
5. 生成最终的xlsx文件及元数据附属文件
python3 scripts/write_xlsx.py --config campaign.json
If the runner's client-side wait elapses with the Actor still running on Apify, use `scripts/fetch_run_artifacts.js --run-id <id> --output {base}.json` instead of restarting. If the parent run is missing `SUB_ACTOR_RESULTS` (post-2026-05-20 Actor schema), `scripts/fetch_subactors_from_log.js` resolves sub-Actor runIds from the parent log.python3 scripts/write_xlsx.py --config campaign.json
如果Apify上的Actor仍在运行,但本地客户端等待超时,可使用`scripts/fetch_run_artifacts.js --run-id <id> --output {base}.json`替代重启。如果主运行缺少`SUB_ACTOR_RESULTS`(2026-05-20之后的Actor schema),`scripts/fetch_subactors_from_log.js`将从主运行日志中解析子Actor的runId。Workflow
工作流程
Copy this checklist and track progress:
Task Progress:
- [ ] Step 1: Collect required anchor inputs incl. goal (block on these)
- [ ] Step 2: Collect brand voice, partnership type, output format
- [ ] Step 3: Run apify/link-prospecting-tool
- [ ] Step 4: Pull leads, mentions, authors, and sub-Actor datasets
- [ ] Step 5: Enrich every domain with Ahrefs metrics, assign Prospect Tier
- [ ] Step 6: Run skip pass — flag rows to drop before drafting
- [ ] Step 7: Compute "Why This Prospect" tag per surviving row
- [ ] Step 8: Compose per-row 3-artifact placement + outreach-type-aware email
- [ ] Step 9: Render output in chosen format复制此清单并跟踪进度:
任务进度:
- [ ] 步骤1:收集必需的核心输入,包括目标(需完成这些才能继续)
- [ ] 步骤2:收集品牌调性、合作类型、输出格式
- [ ] 步骤3:运行apify/link-prospecting-tool
- [ ] 步骤4:提取线索、提及记录、作者信息及子Actor数据集
- [ ] 步骤5:使用Ahrefs指标丰富每个域名信息,分配潜在合作方层级
- [ ] 步骤6:执行筛选流程——标记需要剔除的记录,再开始起草邮件
- [ ] 步骤7:为留存的每条记录计算“选择该潜在合作方的理由”标签
- [ ] 步骤8:为每条记录撰写包含三种形式的链接放置方案+适配推广类型的邮件
- [ ] 步骤9:以选定格式渲染输出Step 1: Required Anchor Inputs (ask FIRST, before anything else)
步骤1:必需的核心输入(首先询问,优先完成)
Do NOT proceed to Step 2 until every required input is answered. Surface them as the very first interaction. The dedup input (#7) is optional but must still be explicitly asked.
-
Concrete goal for this campaign — pick one preset or supply custom text. The goal drives skip-pass filtering, outreach-type template selection, and Prospect Tier thresholds. Required.
Preset Effect downstream Recover unlinked brand mentionsSkip pass drops every row where isbrand_mentioned_in_source. Default outreach type =false.unlinked-mention-claimReplace competitor linksSkip pass drops every row not tagged . Default outreach type =Links to competitor.competitor-link-replacementTopical authority links to specific URLNo filter. Tier thresholds tighten (DR ≥ 50 for tier A). Default outreach type chosen per-row from .Why This ProspectMaximum link volume from any relevant siteNo filter. Tier thresholds relax (DR ≥ 30 for tier A). Default outreach type chosen per-row. CustomUser-supplied paragraph; biases email tone and tier weights. No automatic skip filter. -
Target keyword(s) — one or more keywords the user wants their link to appear next to. The skill prospects the SERP for each. At least one required.
-
Brand name — the user's brand or product name. The Actor will not run without this (it is theinput field).
brand -
Product/category description — one or two sentences describing what the user sells, who they sell to, and what category their product fits in. Example: "Apify — web scraping platform that runs serverless scrapers as APIs. We sell to developers and data teams who need scraped data without managing infrastructure." Required. Used in Step 6 (topical-fit gate) and Step 7 (adversarial-mention detection) to recognise prospects who are in the same product category — those won't link no matter the pitch. Without this, the skill cannot distinguish a genuine editorial opportunity from a competitor's blog.
-
URL of content to link to — the destination URL that will be inserted into partner articles. Required.
-
Competitors — anyone in the user's product category who would publish a "ours vs theirs" comparison page on their own site. Frame the ask this way explicitly: "List every company that would write an X-vs-YourBrand comparison page. These won't link to you no matter what — small competitors count too." Encourage 10+ entries; most users default to listing 3–5 obvious ones and miss the long tail. Mapped toon the Actor and reused in Steps 6 (adversarial-mention skip) and 7 (
competitorDomainsWhy-tag).Links to competitorAfter the user answers, offer (do not push) an Ahrefs auto-pull of organic competitors: "Want me to pull your top organic competitors from Ahrefs and add them to this list? Adds ~50 API units and surfaces smaller competitors you may have missed." If the user says yes and Ahrefs MCP is available, callon the user's domain (extracted from input #5) and merge results intomcp__claude_ai_Ahrefs__site-explorer-organic-competitors. If Ahrefs is unavailable or the user declines, proceed with the user-supplied list only.competitorDomains -
Already-pitched domains (optional) — domains the user has already contacted in past campaigns. Accept a comma-separated list, a CSV/Sheet path, or "none". The skill drops these in the skip pass so the user doesn't double-pitch. Not required to proceed.
-
Number of organic results per keyword — how many Google organic SERP results to prospect per keyword. Default 10 if the user is unsure, but ask the question so the user knows the lever exists. Mapped to.
organicResult -
LLM sources to track — multi-select. Each enabled engine queries an additional AI search/chat surface and adds Google Search Scraper sub-Actor cost per result fetched. Default: all enabled. Mapping to Actor input flags:
Option Actor flag Cost impact ChatGPT Search enableChatGptPer-result Google Search Scraper cost Gemini enableGeminiPer-result Google Search Scraper cost Copilot (Microsoft / Bing) enableCopilotPer-result Google Search Scraper cost Perplexity enablePerplexityPer-result Google Search Scraper cost Google AI Mode enableAiModePer-result Google Search Scraper cost Google AI Overviews enableAiOverviewsFree — parsed from the SERP already fetched. Keep on regardless of budget. Surface the multi-select to the user with all six pre-checked. Disabling individual engines is the main cost-cutting lever short of dropping— recommend keeping ChatGPT + Gemini on at minimum (they capture the largest share of LLM-driven discovery traffic in 2026).organicResult -
Run email verification? — boolean. Default:. Mapped to
yeson the Actor. When enabled, the Actor verifies every email returned by the Contact Details Scraper sub-Actor and tags each lead with a verification status (enableEmailVerification/verified/catch-all/risky/invalid). The skill uses the status in Step 6 (invalid emails get auto-skipped) and surfaces it as theunknowncolumn in the output. Disable only if the user is rate-limited on verification quota or running cost-tight smoke tests.Email Verification
Once 1–6 and 8–10 are captured (7 is optional), move on.
在收集到所有必需输入前,请勿进入步骤2。将这些作为首次交互的核心内容。去重输入(第7项)为可选,但仍需明确询问。
-
本次活动的具体目标——选择一个预设目标或提供自定义文本。目标将驱动筛选流程、推广类型模板选择及潜在合作方层级阈值。必填项。
预设目标 下游影响 (回收未链接品牌提及)Recover unlinked brand mentions筛选流程会剔除所有 为brand_mentioned_in_source的记录。默认推广类型 =false(未链接提及认领)。unlinked-mention-claim(替换竞品链接)Replace competitor links筛选流程会剔除所有未标记为 (链接至竞品)的记录。默认推广类型 =Links to competitor(竞品链接替换)。competitor-link-replacement(为特定URL获取主题权威链接)Topical authority links to specific URL无筛选条件。层级阈值收紧(A级要求DR ≥ 50)。默认推广类型根据每条记录的“选择该潜在合作方的理由”确定。 (从所有相关网站获取最大链接量)Maximum link volume from any relevant site无筛选条件。层级阈值放宽(A级要求DR ≥ 30)。默认推广类型根据每条记录确定。 (自定义)Custom用户提供的段落;会影响邮件语气和层级权重。无自动筛选规则。 -
目标关键词——用户希望其链接出现在附近的一个或多个关键词。本技能会针对每个关键词挖掘SERP结果。至少填写一个,必填项。
-
品牌名称——用户的品牌或产品名称。Actor无法在缺少该信息的情况下运行(这是输入字段)。必填项。
brand -
产品/品类描述——用1-2句话描述用户销售的产品、目标受众及产品所属品类。示例:"Apify — 无服务器爬虫API化运行的网页抓取平台。我们面向需要抓取数据但无需管理基础设施的开发者和数据团队。" 必填项。用于步骤6(主题适配校验)和步骤7(对立提及检测),以识别同品类的潜在合作方——这类合作方无论推广方式如何都不会提供链接。缺少此信息,本技能无法区分真正的编辑合作机会与竞品博客。
-
待推广内容的URL——将插入合作方文章的目标URL。必填项。
-
竞品——用户所在产品品类中,会在自身网站发布“我方vs对方”对比页面的企业。明确以如下方式询问:"列出所有会撰写X-vs-YourBrand对比页面的公司。无论推广方式如何,这些公司都不会链接到你——小型竞品也需计入。" 建议提供10个以上条目;大多数用户默认只列出3-5个明显的竞品,会遗漏长尾竞品。该信息会映射到Actor的字段,并在步骤6(对立提及筛选)和步骤7(
competitorDomains标签)中复用。Links to competitor用户回复后,主动提供(而非强制)从Ahrefs自动获取自然搜索竞品的选项:"是否需要我从Ahrefs获取你的顶级自然搜索竞品并添加到此列表?会消耗约50个API单元,还能帮你发现可能遗漏的小型竞品。" 如果用户同意且Ahrefs MCP可用,调用工具(目标为用户输入第5项中的域名),并将结果合并到mcp__claude_ai_Ahrefs__site-explorer-organic-competitors中。如果Ahrefs不可用或用户拒绝,则仅使用用户提供的列表。competitorDomains -
已联系过的域名(可选)——用户在过往活动中已联系过的域名。接受逗号分隔的列表、CSV/表格路径或“无”。本技能会在筛选流程中剔除这些域名,避免重复联系。非必填项。
-
每个关键词的自然搜索结果数量——针对每个关键词挖掘多少条Google自然SERP结果。如果用户不确定,默认值为10,但需询问该问题,让用户知晓此可配置项。映射到字段。
organicResult -
需跟踪的LLM来源——多选。每个启用的引擎会额外查询一个AI搜索/聊天界面,并根据获取的结果增加Google Search Scraper子Actor的成本。默认:全部启用。与Actor输入标志的映射关系:
选项 Actor标志 成本影响 ChatGPT Search enableChatGpt每条结果产生Google Search Scraper成本 Gemini enableGemini每条结果产生Google Search Scraper成本 Copilot(Microsoft / Bing) enableCopilot每条结果产生Google Search Scraper成本 Perplexity enablePerplexity每条结果产生Google Search Scraper成本 Google AI Mode enableAiMode每条结果产生Google Search Scraper成本 Google AI Overviews enableAiOverviews免费——从已获取的SERP中解析而来。无论预算如何,建议保持启用。 向用户展示多选选项,默认全选。禁用个别引擎是除了减少之外的主要成本控制手段——建议至少保留ChatGPT + Gemini(它们在2026年占据了最大份额的LLM驱动发现流量)。organicResult -
是否运行邮箱验证?——布尔值。默认:。映射到Actor的
yes字段。启用后,Actor会验证Contact Details Scraper子Actor返回的每个邮箱,并为每条线索标记验证状态(enableEmailVerification/verified/catch-all/risky/invalid)。本技能会在步骤6(自动剔除无效邮箱)中使用该状态,并在输出的unknown列中展示。仅当用户邮箱验证配额受限或运行成本紧张的测试时,才建议禁用。Email Verification
收集完1-6及8-10项(第7项可选)后,进入下一步。
Step 2: Secondary Inputs
步骤2:次要输入
Ask these next:
-
Brand info and voice — a short paragraph describing the product/brand and the tone for outreach (e.g., "casual and helpful", "formal B2B", "founder-led"). Used verbatim to shape every generated email.
-
Partnership type — the offer the user is willing to make. Determines the offer paragraph substituted into the per-row email. Outreach-type template selection happens separately, per-row, in Step 8.
Option What it offers ABC link exchange Three-way link swap: partner links to user, user links to a third party, third party links to partner. Direct A B link exchange Two-way link swap: partner links to user, user links to partner. Resource page / list inclusion Ask to be added to an existing curated list or roundup. No reciprocal link offered. Unilateral ask (no reciprocal) User asks for the link without offering anything in return — appropriate for unlinked-mention claims and broken-link replacements. Other User types their own offer (paid placement, free product, co-authored content, etc.). -
Output format:
Format Behavior xlsxwrites a styled spreadsheet to disk.run_actor.jsmarkdownAgent renders the table inline in chat with email drafts beneath each row.
接下来询问以下内容:
-
品牌信息与调性——简短描述产品/品牌及外展邮件的语气(例如:“轻松友好”“正式B2B风格”“创始人主导风格”)。会直接用于塑造每封生成的邮件。
-
合作类型——用户愿意提供的合作条件。决定了插入到每条记录邮件中的合作提议段落。推广类型模板的选择会在步骤8中单独针对每条记录进行。
选项 合作内容 ABC link exchange(三方链接交换) 三方链接互换:合作方链接到用户,用户链接到第三方,第三方链接到合作方。 Direct A B link exchange(双方直接链接交换) 双方链接互换:合作方链接到用户,用户链接到合作方。 Resource page / list inclusion(资源页面/列表收录) 请求加入现有精选列表或汇总页面。不提供反向链接。 Unilateral ask (no reciprocal)(单向请求,无反向链接) 用户仅请求链接,不提供任何回报——适用于未链接提及认领和失效链接替换场景。 Other(其他) 用户自定义合作条件(付费放置、免费产品、联合创作内容等)。 -
输出格式:
格式 行为 xlsx会在本地生成带样式的电子表格。run_actor.jsmarkdown代理会在聊天中内联渲染表格,并在每条记录下方展示邮件草稿。
Step 3: Run the Actor
步骤3:运行Actor
The Actor ID is . Full input schema lives in .
apify/link-prospecting-toolreference/apify-actor-usage.mdRecommended call payload for this skill (defaults chosen for outreach-first workflow):
json
{
"queries": "<keyword 1>\n<keyword 2>",
"brand": "<user's brand name>",
"ownDomains": ["<user-domain.com>"],
"competitorDomains": [],
"ignoreDomains": [
"wikipedia.org", "github.com", "stackoverflow.com", "stackexchange.com",
"reddit.com", "quora.com", "youtube.com", "twitter.com", "x.com",
"linkedin.com", "facebook.com", "medium.com", "archive.org",
"chromewebstore.google.com", "addons.mozilla.org", "apps.apple.com",
"play.google.com", "microsoftedge.microsoft.com", "marketplace.visualstudio.com"
],
"organicResult": 10,
"maxContactsPerDomain": 3,
"department": ["marketing"],
"searchAuthorName": true,
"includeMention": true,
"enableChatGpt": true,
"enableGemini": true,
"enableCopilot": true,
"enablePerplexity": true,
"enableAiMode": true,
"enableAiOverviews": true,
"enableEmailVerification": true
}The six LLM-source flags map 1:1 to the user's Step 1 input #9 multi-select. Pass for any engine the user deselected. maps to Step 1 input #10.
enable*falseenableEmailVerificationThe default includes two groups:
ignoreDomains- Giants and UGC (wikipedia, github, stackoverflow, reddit, etc.) — too broad to pitch as editorial partners.
- App / extension marketplaces (Chrome Web Store, Firefox Add-ons, Apple/Google Play, VS Code Marketplace, etc.) — product directory listings, no editorial decision-makers.
Do NOT auto-add to (let the user decide):
ignoreDomains- UGC/community sites like ,
kaggle.com,dev.to,substack.com,producthunt.com,g2.com,capterra.com— some users get real value pitching these.trustpilot.com - API directories like ,
rapidapi.com,programmableweb.com— relevant for some products (especially developer-tool brands), irrelevant for others. Surface these as candidates only if the user wants to add them.publicapis.dev
The URL-pattern skip rules in Step 6 catch the per-row noise (subdomain prefixes, path patterns) that can't express.
ignoreDomainsdepartment["marketing"]marketingsalesc_suiteCall the runner script:
bash
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/scripts/run_actor.js \
--actor "apify/link-prospecting-tool" \
--input 'JSON_INPUT' \
--timeout 1800 \
--fetch-sub-datasets \
--output YYYY-MM-DD_outreach.json \
--format jsonNotes:
- is the recommended client-side wait. The Actor itself runs 15-50+ min depending on keyword count, LLM-engine fan-out, and
--timeout 1800. Past calibration runs land in the 20–55 min range. Bumping the default avoids the partial-result situation where the runner gives up but the Actor keeps going.enableEmailVerification - If the client-side wait still elapses with the Actor still running on Apify (status or
RUNNINGwhen the runner exits), do not restart the Actor. UseREADYto poll the existing run and download all artifacts — same output shape asscripts/fetch_run_artifacts.js --run-id <id> --output <file>.run_actor.js --fetch-sub-datasets - downloads sibling files alongside the main output:
--fetch-sub-datasets,*_mentions.json,*_authors.json,*_serp.json. You need all of them to populate every output column.*_wcc.json
Actor ID为。完整输入schema见。
apify/link-prospecting-toolreference/apify-actor-usage.md针对本技能的推荐调用负载(默认配置适配外展优先的工作流程):
json
{
"queries": "<keyword 1>\n<keyword 2>",
"brand": "<user's brand name>",
"ownDomains": ["<user-domain.com>"],
"competitorDomains": [],
"ignoreDomains": [
"wikipedia.org", "github.com", "stackoverflow.com", "stackexchange.com",
"reddit.com", "quora.com", "youtube.com", "twitter.com", "x.com",
"linkedin.com", "facebook.com", "medium.com", "archive.org",
"chromewebstore.google.com", "addons.mozilla.org", "apps.apple.com",
"play.google.com", "microsoftedge.microsoft.com", "marketplace.visualstudio.com"
],
"organicResult": 10,
"maxContactsPerDomain": 3,
"department": ["marketing"],
"searchAuthorName": true,
"includeMention": true,
"enableChatGpt": true,
"enableGemini": true,
"enableCopilot": true,
"enablePerplexity": true,
"enableAiMode": true,
"enableAiOverviews": true,
"enableEmailVerification": true
}六个LLM来源标志与用户步骤1输入第9项的多选设置一一对应。对于用户取消选择的引擎,传递。映射到步骤1输入第10项。
enable*falseenableEmailVerification默认的包含两类域名:
ignoreDomains- 巨头平台与UGC站点(wikipedia、github、stackoverflow、reddit等)——范围过广,不适合作为编辑合作方进行推广。
- 应用/扩展市场(Chrome Web Store、Firefox Add-ons、苹果/谷歌应用商店、VS Code Marketplace等)——产品目录列表,无编辑决策者。
请勿自动添加到(让用户自行决定):
ignoreDomains- UGC/社区站点如、
kaggle.com、dev.to、substack.com、producthunt.com、g2.com、capterra.com——部分用户能通过推广这些站点获得实际价值。trustpilot.com - API目录如、
rapidapi.com、programmableweb.com——对部分产品(尤其是开发者工具品牌)相关,对其他产品无关。仅当用户希望添加时,才将这些作为候选域名展示。publicapis.dev
步骤6中的URL模式筛选规则会处理无法覆盖的逐行噪声(子域名前缀、路径模式)。
ignoreDomainsdepartment["marketing"]marketingsalesc_suite调用运行脚本:
bash
node --env-file=.env ${CLAUDE_PLUGIN_ROOT}/scripts/run_actor.js \
--actor "apify/link-prospecting-tool" \
--input 'JSON_INPUT' \
--timeout 1800 \
--fetch-sub-datasets \
--output YYYY-MM-DD_outreach.json \
--format json注意事项:
- 是推荐的客户端等待时间。Actor本身的运行时间为15-50+分钟,具体取决于关键词数量、LLM引擎扩展及
--timeout 1800设置。过往校准运行的时间范围为20-55分钟。提高默认值可避免客户端超时但Actor仍在运行的部分结果情况。enableEmailVerification - 如果客户端等待超时但Apify上的Actor仍在运行(运行状态为或
RUNNING),请勿重启Actor。使用READY轮询现有运行并下载所有产物——输出格式与scripts/fetch_run_artifacts.js --run-id <id> --output <file>相同。run_actor.js --fetch-sub-datasets - 会在主输出文件旁下载附属文件:
--fetch-sub-datasets、*_mentions.json、*_authors.json、*_serp.json。需要所有这些文件才能填充输出的每一列。*_wcc.json
Step 4: Access All Datasets
步骤4:访问所有数据集
The Actor's output schema changed on or before 2026-05-20. The build_prospects script must handle the new shape; older skill versions that joined a separate MENTIONS dataset are broken.
Current schema (verified 2026-05-20):
| File written by runner / fetcher | Source | Populates |
|---|---|---|
| "All leads" dataset | |
| Google Search Results Scraper sub-Actor (one item per | |
| Website Content Crawler sub-Actor | |
| AI Web Scraper sub-Actor (when | |
What changed (vs. pre-2026-05-20 runs):
- No separate /
MENTIONS/AUTHORSnamed datasets — mention info is folded intoDOMAINS_WITH_LEADS.main_leads[i].source_url[] - No record in the parent run's key-value store. Sub-Actor runIds are now only discoverable from the parent run log via regex
SUB_ACTOR_RESULTS. The runner script's\[apify\.<slug> runId:([A-Za-z0-9]+)\]flag now falls back to log-parsing when the KV index is missing; the standalone--fetch-sub-datasetsdoes the same for runs whose runner already exited.scripts/fetch_subactors_from_log.js - The mentions schema reduced: carries only
source_url[i]— no per-engine flags like the old{domain, brand_mentioned_in_source, url}/ChatGPT_mention. Engine attribution must be reconstructed from the SERP sub-dataset's LLM-result sub-fields (see SERP row above).Perplexity_mention
If a column's source is missing, write and add a manual-lookup hint in . Never fabricate.
"Not found"NotesActor的输出schema在2026-05-20或之前已变更。build_prospects脚本必须适配新格式;旧版本技能中使用单独MENTIONS数据集进行关联的方式已失效。
当前schema(2026-05-20验证):
| 运行器/获取器生成的文件 | 来源 | 填充字段 |
|---|---|---|
| "所有线索"数据集 | |
| Google Search Results Scraper子Actor(每个 | |
| Website Content Crawler子Actor | |
| AI Web Scraper子Actor(当 | |
与2026-05-20之前的运行相比的变化:
- 不再有单独的/
MENTIONS/AUTHORS命名数据集——提及信息已整合到DOMAINS_WITH_LEADS中。main_leads[i].source_url[] - 主运行的键值存储中不再有记录。子Actor的runId现在只能通过正则表达式
SUB_ACTOR_RESULTS从主运行日志中获取。当KV索引缺失时,运行器脚本的\[apify\.<slug> runId:([A-Za-z0-9]+)\]标志现在会回退到日志解析;独立的--fetch-sub-datasets脚本对已退出的运行执行相同操作。scripts/fetch_subactors_from_log.js - 提及schema简化:仅包含
source_url[i]——不再有旧版的{domain, brand_mentioned_in_source, url}/ChatGPT_mention等按引擎划分的标志。引擎归属必须从SERP子数据集的LLM结果子字段中重建(见上方SERP行说明)。Perplexity_mention
如果某列的来源数据缺失,填写并在中添加手动查找提示。切勿编造数据。
"Not found"NotesStep 5: Ahrefs Enrichment and Prospect Tier
步骤5:Ahrefs信息丰富与潜在合作方层级
For every unique domain that survived the Actor's filtering, fetch authority and traffic metrics via Ahrefs MCP. Call all three tools in parallel per domain (and across domains — batch parallelise to keep this step under a minute for typical 20–50 prospect lists):
| Ahrefs tool | Used for | Column it populates |
|---|---|---|
| Domain Rating | |
| Page-level organic traffic (last 30 days) | |
| Referring domains count | |
If Ahrefs returns no data (domain not indexed, page too new), set the column to and add a hint . Do not fabricate values.
"-"Notes"Ahrefs has no data — verify manually before pitching"Assign using the thresholds matching the user's goal:
Prospect Tier| Goal | Tier A | Tier B | Tier C |
|---|---|---|---|
| DR ≥ 50 AND Page Traffic ≥ 300/mo | DR 30–49 OR Page Traffic 50–299 | everything below |
| DR ≥ 30 AND Page Traffic ≥ 100/mo | DR 15–29 OR Page Traffic 20–99 | everything below |
| irrelevant — every mention is worth claiming; tier by DR alone (≥ 40 = A, 20–39 = B, < 20 = C) | ||
| tier by DR (≥ 50 = A, 30–49 = B, < 30 = C) | ||
| use the |
Surface tier breakdown to the user before Step 8 — let them confirm whether to draft emails for all tiers or only A/B.
对于所有通过Actor筛选的唯一域名,通过Ahrefs MCP获取权重和流量指标。针对每个域名并行调用这三个工具(同时跨域名批量并行处理,确保此步骤在典型的20-50个潜在合作方列表场景下耗时不超过1分钟):
| Ahrefs工具 | 用途 | 填充列 |
|---|---|---|
| 域名权重 | |
| 页面级自然搜索流量(过去30天) | |
| 引用域名数量 | |
如果Ahrefs未返回数据(域名未被索引、页面过新),将该列设置为并在中添加提示。切勿编造数值。
"-"Notes"Ahrefs无数据——推广前请手动验证"根据用户目标的阈值分配:
Prospect Tier| 目标 | A级 | B级 | C级 |
|---|---|---|---|
| DR ≥ 50 且 Page Traffic ≥ 300/月 | DR 30–49 或 Page Traffic 50–299 | 其余所有 |
| DR ≥ 30 且 Page Traffic ≥ 100/月 | DR 15–29 或 Page Traffic 20–99 | 其余所有 |
| 无关——所有提及都值得认领;仅按DR划分层级(≥40=A级,20–39=B级,<20=C级) | ||
| 按DR划分层级(≥50=A级,30–49=B级,<30=C级) | ||
| 使用 |
在步骤8之前向用户展示层级划分结果——让用户确认是否为所有层级或仅A/B级起草邮件。
Step 6: Skip Pass
步骤6:筛选流程
Before drafting any email, walk every row and apply skip rules. Skipped rows get , a one-line reason in , and no email or placement is generated (saves tokens and user review time).
Outreach Status = "Skip"NotesSkip rules (in order):
-
Goal mismatch. If the goal isand the row's Mentions data shows
Recover unlinked brand mentions, skip. If the goal isbrand_mentioned_in_source: falseand the row's WCC body has no outbound link to anyReplace competitor linksentry, skip.competitorDomains -
Already pitched. If the row's domain matches an entry in the optional already-pitched list from Step 1 input #7, skip.
-
Own / competitor domain leak. The Actor should already filter these, but double-check — if the row's domain matchesor
ownDomains, skip.competitorDomains -
Stale content. Ifis older than 5 years, skip (low chance the editor will update the post).
Publish Date -
URL-pattern skip. Skip rows whose URL matches any of these patterns:
- Subdomain prefixes: ,
developers.*,docs.*,support.*,helpcenter.*,legacy.*,dsarequests.*,connectivity.*,community.*(when used as a doc subdomain — e.g.dev.*),dev.example.com/api/only when followed by a path that's clearly documentation (api.*,/reference/,/docs/). Do NOT skip/spec/or other API-directory domains by this rule alone —rapidapi.comis a subdomain check, not a substring check.api.* - Path patterns: ,
/api-docs/,/reference/,/marketplace/,/extensions/,/profile/,/users/,/free-tools/,/spec/,/content/privacy,/content/terms,/content/dma,/content/how_we_work,/legal/,/_redirects./sitemap - Vendor product page patterns: URL ends in ,
-scraper.php, contains-scraping.php,-data-scraper.,-data-scraping.,/bots/,/extension/(extension detail pages)./detail/
- Subdomain prefixes:
-
Non-editorial page type. Inspect the WCC page body. Skip vendor product pages, pricing pages, login walls, sign-up pages, terms/legal pages, and pages with fewer than 400 words of body text. Word count <400 is the threshold — most editorial articles are 800+ words.
-
UGC slipped through. If the page URL contains,
/forum/,/thread/,/comments/,/answers/,/q/,/topic/, or the WCC body is structured as discussion replies, skip./discussion/ -
Category-fit gate (loose). Extract 4–6 category keywords from the user's product description (Step 1 input #4) — these describe the product category, not the specific subject of the user's URL. Examples for a web-scraping product:,
scrape,scraping,scraper,crawl,extract. For a CMS product:data extraction,cms,headless,content. The row's WCC body must contain at least 1 of these category keywords. If not, skip with reasoneditorial— kills recipe blogs, finance articles, and other off-category content that slipped through SERP filtering.Article isn't in user's product category (no '<kw>' match)For non-English campaigns, include both source-language and English keywords in the category set — many Czech/German/French articles cite English brand names and product categories inline. Example for a Czech water-filtration brand:. A pure-Czech keyword set would miss articles by Czech authors who write in mixed CS/EN.{filtr, filtrace, vod, voda, filter, filtration, water}Known false negatives this rule can't catch (the per-row sub-agent in Step 8 must catch them):- Local e-commerce competitors selling the exact same product category. Past campaigns have seen multiple regional e-shops survive the mechanical pass — typically platform-based stores (e.g. Shoptet, Shopify) with "add to cart" buttons embedded in the article body. The sub-agents correctly skipped them, but the wasted compute is a smell. Future versions of this rule should detect platform fingerprints (platform bundle URLs, locale-specific add-to-cart strings, , embedded product cards with prices in body) and pre-skip.
/eshop/ - Category-name homonyms. "filtr" in Czech also means "filter" in the photography or coffee sense — a coffee-filter or camera-filter blog would pass this gate but isn't a real fit. Sub-agent catches these by reading the body context.
The category gate is intentionally loose. It is a category check, not a subject check — fine-grained "does this specific article fit my specific URL?" is delegated to the per-row sub-agent in Step 8. Example: for a user URL specifically about scraping a single travel site, a general "python web scraping" guide that never mentions that travel site passes this gate because it's in the user's category. The Step 8 sub-agent then decides whether to draft a placement (e.g., an additive line that names the specific travel site) or to recommend a content-based skip.Surface the extracted category keyword list to the user at the start of Step 6 and let them add/remove before the pass runs. - Local e-commerce competitors selling the exact same product category. Past campaigns have seen multiple regional e-shops survive the mechanical pass — typically platform-based stores (e.g. Shoptet, Shopify) with "add to cart" buttons embedded in the article body. The sub-agents correctly skipped them, but the wasted compute is a smell. Future versions of this rule should detect platform fingerprints (platform bundle URLs, locale-specific add-to-cart strings,
-
Adversarial-mention detection. When, scan ±100 characters around the brand mention in the WCC body for negative-context tokens:
brand_mentioned_in_source: true,vs,versus,alternative to,alternatives to,compared to,compared with,instead of,better than,worse than,pros and cons,comparison. If any of those appear within the window, skip with reasonreview of. This catches "ScrapeHero vs Apify" footer mentions, "alternatives to YourBrand" listicles, and similar non-link contexts. Critical — without this rule, theAdversarial mention (likely competitor comparison page) — won't linkoutreach type fires on dozens of false positives.unlinked-mention-claim -
No contact AND no editorial path. IfAND no
Contact Email = "Not found"AND no domain-level contact page found in WCC outbound links, skip — there is no one to pitch.Article Author -
Invalid email (from verification). Whenran, inspect each row's
enableEmailVerification: truestatus. If the primary contact's status isEmail Verification, try the alternate contacts first before skipping the row — past runs have lost Tier A candidates because the primary contact's email was invalid but a verified alternate existed on the same domain. Only skip with reasoninvalidwhen no alternate has a verified or unchecked email. StatusesEmail failed verification (invalid address),catch-all,riskyare informational only (not auto-skipped) — surface them in theunknowncolumn. StatusEmail Verificationships as-is. If verification didn't run for this campaign, the column showsverifiedand this rule is a no-op.-
Never suggest external lookup services or workaround tools in Notes — no , no , no third-party verification services. The skill's job is to surface what we found, factually. When information is missing (no email, no author, etc.), state the gap and stop. The user knows where to look; suggesting their tools back at them is condescending and clutters the output.
hunter.ioLinkedIn searchA row failing any rule above is skipped before Step 7. Skipped rows still appear in the final output (so the user can see what was filtered) but with empty placement and email cells and plus the reason in .
Outreach Status = "Skip"Notes在起草任何邮件之前,遍历每条记录并应用筛选规则。被筛选的记录会标记为,在中添加一行原因,并且不会生成邮件或链接放置方案(节省令牌和用户审核时间)。
Outreach Status = "Skip"Notes筛选规则(按顺序):
-
目标不匹配。如果目标是且记录的提及数据显示
Recover unlinked brand mentions,则筛选掉。如果目标是brand_mentioned_in_source: false且记录的WCC正文中没有指向任何Replace competitor links条目的出站链接,则筛选掉。competitorDomains -
已联系过。如果记录的域名与步骤1输入第7项的可选已联系列表中的条目匹配,则筛选掉。
-
自有/竞品域名泄露。Actor应已过滤这些域名,但需再次检查——如果记录的域名与或
ownDomains匹配,则筛选掉。competitorDomains -
内容过时。如果早于5年,则筛选掉(编辑更新该文章的概率极低)。
Publish Date -
URL模式筛选。如果URL匹配以下任一模式,则筛选掉:
- 子域名前缀:、
developers.*、docs.*、support.*、helpcenter.*、legacy.*、dsarequests.*、connectivity.*、community.*(用作文档子域名时——例如dev.*)、dev.example.com/api/仅当后续路径明显为文档时(api.*、/reference/、/docs/)。请勿仅通过此规则筛选/spec/或其他API目录域名——rapidapi.com是子域名检查,而非子字符串检查。api.* - 路径模式:、
/api-docs/、/reference/、/marketplace/、/extensions/、/profile/、/users/、/free-tools/、/spec/、/content/privacy、/content/terms、/content/dma、/content/how_we_work、/legal/、/_redirects。/sitemap - 厂商产品页面模式:URL以、
-scraper.php结尾,包含-scraping.php、-data-scraper.、-data-scraping.、/bots/、/extension/(扩展详情页面)。/detail/
- 子域名前缀:
-
非编辑页面类型。检查WCC页面正文。筛选掉厂商产品页面、定价页面、登录墙、注册页面、条款/法律页面,以及正文少于400词的页面。400词是阈值——大多数编辑文章的字数在800词以上。
-
UGC内容漏网。如果页面URL包含、
/forum/、/thread/、/comments/、/answers/、/q/、/topic/,或WCC正文为讨论回复结构,则筛选掉。/discussion/ -
品类适配校验(宽松)。从用户的产品描述(步骤1输入第4项)中提取4-6个品类关键词——这些关键词描述产品品类,而非用户URL的具体主题。例如,网页抓取产品的品类关键词:、
scrape、scraping、scraper、crawl、extract。CMS产品的品类关键词:data extraction、cms、headless、content。记录的WCC正文必须包含至少1个这些品类关键词。如果不包含,则筛选掉,原因填写editorial(文章不属于用户的产品品类(无'<kw>'匹配))——这会过滤掉食谱博客、财经文章等通过SERP筛选的跨品类内容。Article isn't in user's product category (no '<kw>' match)针对非英文活动,品类关键词集需包含源语言和英文关键词——许多捷克/德国/法国文章会在行内引用英文品牌名称和产品品类。例如,捷克净水品牌的关键词集:。纯捷克语关键词集会错过捷克作者混合使用捷克语/英语撰写的文章。{filtr, filtrace, vod, voda, filter, filtration, water}本规则无法识别的已知假阳性(步骤8的逐行子代理必须识别):- 销售完全相同产品品类的本地电商竞品。过往活动中发现多个区域电商店铺通过了机械校验——通常是基于平台的店铺(如Shoptet、Shopify),文章正文中嵌入了“加入购物车”按钮。子代理会正确筛选掉这些记录,但浪费的计算资源是一个问题。未来版本的规则应检测平台特征(平台捆绑URL、特定地区的加入购物车字符串、、正文中带价格的嵌入式产品卡片)并提前筛选。
/eshop/ - 品类名称同音异义词。捷克语中的“filtr”也指摄影或咖啡领域的“过滤器”——咖啡过滤器或相机过滤器博客会通过此校验,但并非真正适配。子代理会通过读取正文上下文识别这些情况。
品类校验故意设置为宽松。这是品类检查,而非主题检查——细粒度的“这篇具体文章是否适配我的具体URL?”会委托给步骤8的逐行子代理处理。示例:用户URL专门关于抓取单个旅游网站,而一篇通用的“python网页抓取”指南从未提及该旅游网站,会通过此校验,因为它属于用户的产品品类。步骤8的子代理会决定是否起草链接放置方案(例如,添加一行提及该特定旅游网站的内容)或建议基于内容的筛选。在步骤6开始时向用户展示提取的品类关键词列表,让用户在运行校验前添加/删除关键词。 - 销售完全相同产品品类的本地电商竞品。过往活动中发现多个区域电商店铺通过了机械校验——通常是基于平台的店铺(如Shoptet、Shopify),文章正文中嵌入了“加入购物车”按钮。子代理会正确筛选掉这些记录,但浪费的计算资源是一个问题。未来版本的规则应检测平台特征(平台捆绑URL、特定地区的加入购物车字符串、
-
对立提及检测。当时,扫描WCC正文中品牌提及前后±100字符范围内的负面语境令牌:
brand_mentioned_in_source: true、vs、versus、alternative to、alternatives to、compared to、compared with、instead of、better than、worse than、pros and cons、comparison。如果其中任何一个出现在该范围内,则筛选掉,原因填写review of(对立提及(可能是竞品对比页面)——不会提供链接)。这会捕获“ScrapeHero vs Apify”页脚提及、“YourBrand替代方案”列表文章等非链接场景。至关重要——如果没有此规则,Adversarial mention (likely competitor comparison page) — won't link推广类型会在数十个假阳性场景下触发。unlinked-mention-claim -
无联系人且无编辑路径。如果且无
Contact Email = "Not found"且在WCC出站链接中未找到域名级联系页面,则筛选掉——没有可推广的对象。Article Author -
无效邮箱(来自验证)。当运行时,检查每条记录的
enableEmailVerification: true状态。如果主联系人的状态为Email Verification,先尝试备用联系人,再筛选掉该记录——过往运行中曾因主联系人邮箱无效但同一域名存在已验证的备用联系人而丢失A级候选。仅当没有备用联系人的邮箱为已验证或未检查状态时,才筛选掉,原因填写invalid(邮箱验证失败(无效地址))。状态Email failed verification (invalid address)、catch-all、risky仅为信息性(不会自动筛选)——在unknown列中展示。状态Email Verification保持原样。如果本次活动未运行验证,该列显示verified,此规则不生效。-
请勿在Notes中建议外部查找服务或变通工具——不要推荐、LinkedIn搜索或第三方验证服务。本技能的职责是如实展示我们找到的信息。当信息缺失(无邮箱、无作者等)时,说明缺口即可停止。用户知道去哪里查找;向他们推荐工具显得 condescending 且会使输出杂乱。
hunter.io任何违反上述规则的记录会在步骤7之前被筛选掉。被筛选的记录仍会出现在最终输出中(以便用户查看筛选内容),但链接放置和邮箱单元格为空,且并在Notes中添加原因。
Outreach Status = "Skip"Step 7: "Why This Prospect" Tags
步骤7:“选择该潜在合作方的理由”标签
For every surviving row, compute one or two tags, prioritised by which makes the strongest pitch. These tags drive the outreach-type template selection in Step 8.
Why This Prospect| Tag | Trigger | Source of truth |
|---|---|---|
| | Mentions dataset |
| WCC page body contains an outbound link whose host matches any | WCC dataset |
| | Google Search Scraper sub-dataset |
| WCC page body has 10+ outbound links AND the page title or H1 matches | WCC dataset |
| | Google Search Scraper / WCC |
A row may carry up to two tags. Order them by pitch strength using this priority: > > > > . If no tag fits, leave the column as — the row still gets pitched, just without a special angle.
Mentions brand, no backlinkLinks to competitorResource / roundup pageTop-3 SERPOutdated content"-"对于所有留存的记录,计算一个或两个标签,按推广力度优先级排序。这些标签会驱动步骤8中的推广类型模板选择。
Why This Prospect| 标签 | 触发条件 | 数据源 |
|---|---|---|
| 提及数据集中 | 提及数据集 |
| WCC页面正文包含指向任何 | WCC数据集 |
| 任何关键词的 | Google Search Scraper子数据集 |
| WCC页面正文有10+个出站链接,且页面标题或H1匹配 | WCC数据集 |
| | Google Search Scraper / WCC |
一条记录最多可携带两个标签。按以下优先级排序: > > > > 。如果没有标签匹配,该列留空为——记录仍会被推广,只是没有特殊切入点。
Mentions brand, no backlinkLinks to competitorResource / roundup pageTop-3 SERPOutdated content"-"Step 8: Compose Per-Row Placement and Email
步骤8:为每条记录撰写链接放置方案与邮件
Each surviving row gets three placement artifacts plus one email draft. Apply these quality rules without exception:
-
No fabrication. If the article author or contact email is unknown, set the field toand leave a one-line factual note (e.g.,
"Not found"or"No email found for this contact"). Do not suggest external lookup tools or workarounds in Notes — see Step 6 rule 11 for the rationale. Just state the fact and stop."No author detected" -
Prioritise editorial-leaning contacts. When the All leads dataset returned multiple contacts for the same domain, prefer the one whosematches
jobTitle, demote anyone whose/editor|content|writer|managing|editorial|blog|copy/imatchesjobTitleunless the company is a 1–5 person shop. Surface the chosen contact in the row; keep alternates in/ceo|cfo|cto|founder|chief|vp\b|president/iasNotes."Alternate contacts: <name1> (<title>), <name2> (<title>)" -
Three placement artifacts — try strategies in this priority order. Use the WCC sub-dataset's page text. Try strategies 1 → 2 → 3 in order; stop at the first one that produces a clean fit. Record which strategy was used by prepending thefield with
Notes/Placement: drop-in/Placement: additive.Placement: new insertionStrategy 1 — drop-in (preferred). Find a sentence in the article where the user's URL can be added to existing words without changing any of the surrounding prose. The link goes on an existing word or short phrase the author already wrote. Output:- = the verbatim sentence as it appears in the article.
Placement Source Sentence - = the same sentence with the link inserted on an existing word/phrase. No new prose, no rewording, no deletions. Example: source =
Placement With Link→ with-link ="Tools like Octoparse and BeautifulSoup work well for hotel data."(link added to existing word). The editor doesn't have to approve any new wording — just a hyperlink."Tools like Octoparse and **[BeautifulSoup](URL)** work well for hotel data." - =
Placement New Insertion."-"
Drop-in works when the article already names a brand, tool, or technique that maps cleanly to the user's URL. It's the lowest-friction ask of any outreach pattern: "could you add a hyperlink to a word you already wrote?"Strategy 2 — additive (second choice). When no drop-in target exists but the article has a sentence the user's URL would naturally follow, keep the original sentence intact and add one new sentence after it. The new sentence introduces an adjacent reader-need that the article doesn't already cover and that the user's URL addresses. Output:- = the verbatim original sentence.
Placement Source Sentence - = original sentence kept verbatim, followed by
Placement With Linkand a one-sentence follow-on containing the link. Example: source =→→ with-link ="By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data."Keep the original sentence verbatim; the follow-on is the only new prose."By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data. → ...with hotel data. In need of competitor pricing data the API doesn't expose? Then you need a [hotel-data scraper](URL)." - =
Placement New Insertion."-"
The follow-on must (a) raise a reader need the existing sentence doesn't address, (b) connect that need to the user's URL, (c) be one sentence, ≤25 words, written in the article's voice.Strategy 3 — new insertion (last resort). Only when neither drop-in nor additive works (e.g., no relevant sentence exists in the article body). Draft a fully new 1–2 sentence paragraph in the article's voice with a precise insertion location:- =
Placement Source Sentence."-" - =
Placement With Link."-" - = the drafted paragraph + the exact anchor (
Placement New Insertion)."insert as a new paragraph immediately after the sentence ending in '…X.' in the section under H2 'Y'."
If even a new insertion can't be drafted (the article is the wrong topic for the user's URL), setand addOutreach Status = "Skip". However, the topical-fit gate in Step 6 rule 8 should have caught this case already; if a row makes it to Step 8 and can't get a placement, treat that as a hint that the gate needs more keywords.Notes: "No natural placement — article topic mismatch"Every surviving row goes through a sub-agent — not just Tier A/B/mention-only. The mechanical skip pass (Step 6) cuts the obviously bad prospects (competitor domains, doc subdomains, policy pages, dead-contact rows, off-category articles). Everything that passes is by definition a candidate worth real consideration, and the sub-agent makes the final fit call: read the article, attempt a placement (drop-in → additive → new insertion), and either draft email or returnwith a content-specific reason. Python templates / regex / keyword scoring are not acceptable for the final draft — they produce mechanical splices that read awkward in context (we've seen this fail in practice on real campaigns).placement_strategy = "skip"Spawn sub-agents in parallel: one per surviving row, each given the WCC text, user URL context, contact info, brand voice, and partnership offer. The output schema (placement strategy, the three placement column values, email subject + body, skip recommendation, notes) is what gets merged back into the spreadsheet row.A row that the sub-agent decides to skip after content review getsand the agent's reason in Notes — same shape as a Step 6 mechanical skip, just with a more nuanced rationale.Outreach Status = "Skip" -
Determineper row from the
Outreach Typetags + user goal:Why This ProspectTrigger Outreach TypeTag presentMentions brand, no backlinkunlinked-mention-claimTag presentLinks to competitorcompetitor-link-replacementTag presentResource / roundup pageresource-page-inclusionTag presentOutdated contentoutdated-content-replacementNone of the above OR only tagTop-3 SERPtopical-niche-editPull the matching template from. The user's Step 2reference/email-templates.mdanswer substitutes into thePartnership typeplaceholder inside the template — the outreach type determines structure and opening hook, the partnership type determines the offer.{{offer_paragraph}} -
must use the user's brand voice. Apply the voice paragraph verbatim per the voice substitution rules in
Suggested Email Copy. If voice input was skipped, use the generic-professional default and note this inreference/email-templates.md.Notes
5a. The email MUST include the exact placement wording — verbatim. The recipient should never have to ask "what's the wording you're suggesting?" or click through to a separate cell to see the proposed text. Embed the proposal directly:
- For drop-in: quote both the source sentence and the linked version inline. Example:
"In your line 'Tools like X and Y work well for Z', would you turn 'Y' into a hyperlink to <URL>?" - For additive: quote the anchor sentence verbatim AND the exact follow-on sentence you're proposing. Example:
"Right after your sentence 'X happens because Y.', would you add: 'For the Z case specifically, see <URL>.'?" - For new insertion: quote the anchor sentence the new paragraph should follow, then the full proposed paragraph inline. Example:
"In the 'Honorable mentions' section, after 'each platform has its own trade-offs.', would you add this paragraph: '<full paragraph with link>'?"
The email is the ask. If the wording isn't in the email, the ask is incomplete. Vague phrasing like "happy to draft it for you" / "happy to send exact wording" / "a follow-on sentence linking to..." is a content-skill bug — always rewrite to include the verbatim proposal.
-
Word cap: emails are 150 words or less (subject + body combined).
-
Personalisation is mandatory. Every email must open with a concrete reference to the specific article (title + a one-line takeaway from its content). No generic "I loved your article" openers.
每个留存的记录会获得三个链接放置产物加一封邮件草稿。严格遵循以下质量规则:
-
不得编造数据。如果文章作者或联系人邮箱未知,将字段设置为并添加一行事实性说明(例如
"Not found"或"No email found for this contact")。请勿在Notes中建议外部查找工具或变通方法——见步骤6规则11的理由。只需陈述事实即可停止。"No author detected" -
优先选择偏向编辑角色的联系人。当所有线索数据集返回同一域名的多个联系人时,优先选择匹配
jobTitle的联系人,降级/editor|content|writer|managing|editorial|blog|copy/i匹配jobTitle的联系人,除非公司是1-5人的小型团队。在记录中展示选定的联系人;将备用联系人保留在Notes中,格式为/ceo|cfo|cto|founder|chief|vp\b|president/i。"Alternate contacts: <name1> (<title>), <name2> (<title>)" -
三个链接放置产物——按以下优先级尝试策略。使用WCC子数据集的页面文本。按策略1→2→3的顺序尝试;在第一个产生合适结果的策略处停止。通过在Notes字段前添加/
Placement: drop-in/Placement: additive记录使用的策略。Placement: new insertion策略1——直接插入(首选)。在文章中找到一个句子,用户的URL可添加到现有文字中无需修改任何周边内容。链接添加到作者已写的现有单词或短语上。输出:- = 文章中原文原句。
Placement Source Sentence - = 插入链接后的同一句子。不得添加新内容、不得改写、不得删除。示例:原文 =
Placement With Link→ 插入链接后 ="Tools like Octoparse and BeautifulSoup work well for hotel data."(链接添加到现有单词)。编辑无需批准任何新措辞——只需添加一个超链接。"Tools like Octoparse and **[BeautifulSoup](URL)** work well for hotel data." - =
Placement New Insertion。"-"
当文章已提及与用户URL直接匹配的品牌、工具或技术时,直接插入策略适用。这是所有外展模式中最低摩擦的请求:“能否为你已写的单词添加一个超链接?”策略2——附加内容(次选)。当没有直接插入目标,但文章中有一个句子可自然跟随用户的URL时,保留原句不变,并在其后添加一个新句子。新句子提出文章未覆盖的相邻读者需求,且用户的URL可满足该需求。输出:- = 原文原句。
Placement Source Sentence - = 保留原句不变,后跟
Placement With Link和包含链接的一句附加内容。示例:原文 =→→ 插入链接后 ="By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data."保留原句不变;附加内容是唯一的新内容。"By integrating with Acme Travel's APIs, developers can enrich their platforms with hotel data. → ...with hotel data. In need of competitor pricing data the API doesn't expose? Then you need a [hotel-data scraper](URL)." - =
Placement New Insertion。"-"
附加内容必须满足:(a) 提出现有句子未覆盖的读者需求,(b) 将该需求与用户的URL关联,(c) 为一个句子,≤25词,符合文章语气。策略3——新增插入(最后手段)。仅当直接插入和附加内容策略均不适用时(例如,文章正文中无相关句子)。撰写符合文章语气的1-2句完整段落,并指定精确的插入位置:- =
Placement Source Sentence。"-" - =
Placement With Link。"-" - = 撰写的段落 + 精确锚点(
Placement New Insertion)。"insert as a new paragraph immediately after the sentence ending in '…X.' in the section under H2 'Y'."
如果连新增插入都无法撰写(文章主题与用户URL不匹配),设置并添加Outreach Status = "Skip"(无自然适配位置——文章主题不匹配)。不过,步骤6规则8的品类适配校验应已捕获此类情况;如果记录进入步骤8且无法生成链接放置方案,应视为校验需要更多关键词的提示。Notes: "No natural placement — article topic mismatch"所有留存记录都需经过子代理处理——不仅限于A/B级/仅提及记录。机械筛选流程(步骤6)剔除了明显不合格的潜在合作方(竞品域名、文档子域名、政策页面、无联系人记录、跨品类文章)。所有通过筛选的记录理论上都值得认真考虑,子代理会做出最终适配判断:阅读文章,尝试链接放置(直接插入→附加内容→新增插入),并起草邮件或返回及基于内容的原因。Python模板/正则表达式/关键词评分不适用于最终草稿——这些方法会生成在上下文中读起来生硬的机械拼接内容(我们已在实际活动中看到这种方法失败)。placement_strategy = "skip"并行生成子代理:每个留存记录对应一个子代理,每个子代理会获得WCC文本、用户URL上下文、联系人信息、品牌调性及合作提议。输出schema(链接放置策略、三个链接放置列的值、邮件主题+正文、筛选建议、备注)会合并回电子表格记录中。子代理在内容审核后决定筛选的记录会标记为,并在Notes中添加代理给出的原因——格式与步骤6的机械筛选相同,只是理由更细致。Outreach Status = "Skip" -
根据标签+用户目标确定每条记录的
Why This Prospect:Outreach Type触发条件 Outreach Type存在标签 Mentions brand, no backlink(未链接提及认领)unlinked-mention-claim存在标签 Links to competitor(竞品链接替换)competitor-link-replacement存在标签 Resource / roundup page(资源页面收录)resource-page-inclusion存在标签 Outdated content(过时内容替换)outdated-content-replacement无上述标签或仅存在 标签Top-3 SERP(主题niche-edit)topical-niche-edit从中提取匹配的模板。用户步骤2的reference/email-templates.md回复会替换模板中的Partnership type占位符——推广类型决定结构和开场钩子,合作类型决定合作提议。{{offer_paragraph}} -
必须符合用户的品牌调性。严格按照
Suggested Email Copy中的调性替换规则应用调性段落。如果未提供调性输入,使用通用专业风格默认值,并在Notes中说明。reference/email-templates.md
5a. 邮件必须包含精确的链接放置措辞——原文原句。收件人无需询问“你建议的措辞是什么?”或点击单独的单元格查看提议内容。直接在邮件中嵌入提议:
- 对于直接插入:内联引用原文原句和插入链接后的版本。示例:
"In your line 'Tools like X and Y work well for Z', would you turn 'Y' into a hyperlink to <URL>?" - 对于附加内容:内联引用锚点句子原文及提议的精确附加句子。示例:
"Right after your sentence 'X happens because Y.', would you add: 'For the Z case specifically, see <URL>.'?" - 对于新增插入:引用新段落应跟随的锚点句子,然后内联完整的提议段落。示例:
"In the 'Honorable mentions' section, after 'each platform has its own trade-offs.', would you add this paragraph: '<full paragraph with link>'?"
邮件是请求。如果措辞不在邮件中,请求就是不完整的。模糊表述如“happy to draft it for you”/“happy to send exact wording”/“a follow-on sentence linking to...”属于内容技能错误——必须重写为包含原文原句提议的表述。
-
字数限制:邮件总字数(主题+正文)≤150词。
-
必须个性化。每封邮件必须以对具体文章的明确引用开头(标题+对内容的一句总结)。禁止使用通用的“I loved your article”开场。
Step 9: Render Output
步骤9:渲染输出
Markdown format — agent renders inline in chat:
- A header line with the Apify run ID and tier breakdown ().
A: 8, B: 15, C: 7, Skipped: 12 - One Markdown table row per prospect with the most actionable columns (tier, why, contact, placement summary). Skipped rows render in a separate collapsed section at the bottom.
- Below the table, one fenced code block per non-skipped row containing the email draft (subject + body), labeled with the row index, tier, and outreach type.
xlsx format — writes a 2-sheet workbook after Steps 5–8 finish.
scripts/write_xlsx.py --config campaign.jsonxlsx is written as two sheets:
- — active rows only, full 30 columns. This is the send-ready deliverable. Sorted by
Outreachascending (A first), then byProspect Tierdescending, then byDomain DRascending.SERP Position - — skipped rows with reduced columns:
Skipped,Domain,Article URL,Article Title(extracted from Notes),Skip Reason,Source Engines. This sheet exists for auditing what was filtered without cluttering the main view. Missing Ahrefs columns aren't visible here, so the empty-data confusion goes away.Why This Prospect
The user opens the file and lands on by default — only actionable prospects. They can switch to to audit. This pattern replaces the older single-sheet-with-red-rows approach.
OutreachSkippedBoth formats also produce a sidecar : . Drop it next to the main output file.
run_metadata.json{ runId, actorId, startedAt, finishedAt, inputs, datasetIds, tierCounts, skipCounts }Markdown格式——代理在聊天中内联渲染:
- 包含Apify run ID和层级划分的标题行()。
A: 8, B: 15, C: 7, Skipped: 12 - 每个潜在合作方对应一行Markdown表格,包含最具操作性的列(层级、选择理由、联系人、链接放置摘要)。被筛选的记录渲染在底部单独的折叠区域中。
- 表格下方,每个未被筛选的记录对应一个代码块,包含邮件草稿(主题+正文),并标记行索引、层级和推广类型。
xlsx格式——步骤5-8完成后,生成包含两个工作表的工作簿。
scripts/write_xlsx.py --config campaign.jsonxlsx包含两个工作表:
- ——仅包含有效记录,共30列。这是可直接发送的交付物。按
Outreach升序排序(A级优先),然后按Prospect Tier降序,再按Domain DR升序。SERP Position - ——包含被筛选的记录,列数简化:
Skipped、Domain、Article URL、Article Title(从Notes中提取)、Skip Reason、Source Engines。此工作表用于审核筛选内容,不会使主视图杂乱。Ahrefs相关空列不会在此展示,避免数据缺失的困惑。Why This Prospect
用户打开文件时默认进入工作表——仅展示可操作的潜在合作方。他们可切换到工作表进行审核。此模式替代了旧版的单工作表红色标记行方法。
OutreachSkipped两种格式还会生成附属文件:。将其放在主输出文件旁。
run_metadata.json{ runId, actorId, startedAt, finishedAt, inputs, datasetIds, tierCounts, skipCounts }Output Row Schema (30 columns)
输出记录Schema(30列)
SERP PositionSource EnginesKeywordArticle TitleArticle URLDomainDomain DRPage TrafficReferring DomainsProspect TierWhy This ProspectArticle AuthorAuthor SourcePublish DateContact Full NameContact Job TitleDepartmentSeniorityContact EmailEmail Verificationverifiedcatch-allriskyinvalidunknown-Contact LinkedInCompanyOutreach TypePartnership OfferPlacement Source SentencePlacement With LinkPlacement New InsertionSuggested Email CopyOutreach Status"Not started""Skip"NotesFull schema with types and source datasets per column is in .
reference/output-formats.mdSERP PositionSource EnginesKeywordArticle TitleArticle URLDomainDomain DRPage TrafficReferring DomainsProspect TierWhy This ProspectArticle AuthorAuthor SourcePublish DateContact Full NameContact Job TitleDepartmentSeniorityContact EmailEmail Verificationverifiedcatch-allriskyinvalidunknown-Contact LinkedInCompanyOutreach TypePartnership OfferPlacement Source SentencePlacement With LinkPlacement New InsertionSuggested Email CopyOutreach Status"Not started""Skip"Notes包含各列类型和数据源的完整schema见。
reference/output-formats.mdError Handling
错误处理
| Error / symptom | What to do |
|---|---|
| Ask user to create |
| Ahrefs MCP unavailable | Skip Step 5. Set |
| Run |
| The user skipped Step 1 anchor #3. Re-ask brand name. |
Actor run | Do not restart. Use |
Actor run | See |
| Expected for ~30% of pages without bylines. Skill writes |
Sub-Actor datasets missing (no | Actor's output schema changed on or before 2026-05-20: |
| Contact Details Scraper sub-Actor missed the site. Set the field to |
| All rows skipped by goal filter | The user's goal is too narrow for the SERP results. Suggest broadening the goal (e.g., |
| Keyword too narrow, or |
| Costs higher than expected | Sub-Actor fan-out (Google Search Scraper, WCC, Contact Details Scraper, AI Web Scraper) stacks. See cost section in |
| 错误/症状 | 处理方式 |
|---|---|
| 要求用户创建包含 |
| Ahrefs MCP不可用 | 跳过步骤5。将 |
| 在技能的 |
| 用户跳过了步骤1核心输入第3项。重新询问品牌名称。 |
Actor运行 | 请勿重启。使用 |
Actor运行 | 见 |
| 约30%无署名的页面会出现此情况。本技能会填写 |
子Actor数据集缺失( | Actor的输出schema在2026-05-20或之前已变更:KV存储中不再有 |
| Contact Details Scraper子Actor未找到该站点的邮箱。将字段设置为 |
| 所有记录因目标筛选被剔除 | 用户的目标相对于SERP结果过于狭窄。建议放宽目标(例如,将 |
| 关键词过于狭窄,或 |
| 成本高于预期 | 子Actor扩展(Google Search Scraper、WCC、Contact Details Scraper、AI Web Scraper)叠加导致成本增加。见 |