waterfall-enrichment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseWaterfall Enrichment
瀑布式数据丰富
The waterfall pattern runs multiple enrichment providers in sequence and stops as soon as one returns a valid result. This maximizes coverage while minimizing cost — you only pay for lookups that actually run.
瀑布模式会按顺序运行多个数据丰富提供商,一旦其中一个返回有效结果就立即停止。这种模式在最大化数据覆盖范围的同时将成本降至最低——你只需为实际执行的查询付费。
Key concepts
核心概念
- — start a waterfall block named
--with-waterfall <NAME><NAME> - — what you're looking for:
--type,email,phone,linkedin,first_name,last_namefull_name - — JSON path(s) where to find the value in provider output
--result-getters - — close the block; the waterfall name becomes a column with the resolved value
--end-waterfall - After , use
--end-waterfallto reference the resolved scalar{{<waterfall_name>}}
- — 启动一个名为
--with-waterfall <NAME>的瀑布块<NAME> - — 你要查找的数据类型:
--type、email、phone、linkedin、first_name、last_namefull_name - — 在提供商输出中查找值的JSON路径
--result-getters - — 关闭该块;瀑布名称会成为一个包含解析后值的列
--end-waterfall - 在之后,使用
--end-waterfall引用解析后的标量值{{<waterfall_name>}}
Always pilot first
务必先进行试点测试
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "email" \
--type email \
--result-getters '["data.email","email","data.0.email"]' \
--with 'provider_a=tool_name:{"param":"{{Column}}"}' \
--with 'provider_b=tool_name:{"param":"{{Column}}"}' \
--end-waterfallReview output, then scale to for remaining rows.
--rows 1:bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "email" \
--type email \
--result-getters '["data.email","email","data.0.email"]' \
--with 'provider_a=tool_name:{"param":"{{Column}}"}' \
--with 'provider_b=tool_name:{"param":"{{Column}}"}' \
--end-waterfall查看输出结果,然后将用于剩余行以批量处理。
--rows 1:Phone waterfall
电话号码瀑布示例
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "phone" \
--type phone \
--result-getters '["data.phone","phone","mobile","data.mobile"]' \
--with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
--end-waterfallbash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "phone" \
--type phone \
--result-getters '["data.phone","phone","mobile","data.mobile"]' \
--with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
--end-waterfallEmail waterfall (name + company)
邮箱瀑布示例(姓名+公司)
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "email" \
--type email \
--result-getters '["data.email","email","data.0.email"]' \
--with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
--with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
--with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
--end-waterfall \
--with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "email" \
--type email \
--result-getters '["data.email","email","data.0.email"]' \
--with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
--with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
--with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
--end-waterfall \
--with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'LinkedIn URL waterfall
LinkedIn URL瀑布示例
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "linkedin" \
--type linkedin \
--result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
--with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
--with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
--end-waterfallbash
deepline enrich --input leads.csv --in-place --rows 0:1 \
--with-waterfall "linkedin" \
--type linkedin \
--result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
--with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
--with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
--end-waterfallPre-flight validation
运行前验证
Before running any waterfall, validate your input data:
bash
undefined在运行任何瀑布式任务之前,先验证你的输入数据:
bash
undefinedCheck for required columns and empty values
检查必填列和空值
python3 -c "
import csv, sys
with open('leads.csv') as f:
rows = list(csv.DictReader(f))
cols = rows[0].keys() if rows else []
print(f'Columns: {list(cols)}')
print(f'Total rows: {len(rows)}')
python3 -c "
import csv, sys
with open('leads.csv') as f:
rows = list(csv.DictReader(f))
cols = rows[0].keys() if rows else []
print(f'Columns: {list(cols)}')
print(f'Total rows: {len(rows)}')
Check for empty required fields
检查必填字段的空值情况
for col in ['First Name', 'Last Name', 'Company']:
empty = sum(1 for r in rows if not r.get(col, '').strip())
if empty: print(f'WARNING: {empty} rows missing {col}')
for col in ['First Name', 'Last Name', 'Company']:
empty = sum(1 for r in rows if not r.get(col, '').strip())
if empty: print(f'WARNING: {empty} rows missing {col}')
Check for duplicates (same person appears multiple times)
检查重复项(同一联系人多次出现)
keys = [(r.get('First Name','').strip().lower(), r.get('Last Name','').strip().lower(), r.get('Company','').strip().lower()) for r in rows]
from collections import Counter
dupes = {k: v for k, v in Counter(keys).items() if v > 1}
if dupes: print(f'WARNING: {len(dupes)} duplicate contacts — deduplicate before enrichment to avoid paying for the same lookup twice')
"
**Skip rows that can't match:** If a row is missing both email AND name+company, no provider will find a match. Remove these before running to save credits.keys = [(r.get('First Name','').strip().lower(), r.get('Last Name','').strip().lower(), r.get('Company','').strip().lower()) for r in rows]
from collections import Counter
dupes = {k: v for k, v in Counter(keys).items() if v > 1}
if dupes: print(f'WARNING: {len(dupes)} duplicate contacts — deduplicate before enrichment to avoid paying for the same lookup twice')
"
**跳过无法匹配的行:** 如果某一行同时缺少邮箱和“姓名+公司”信息,没有任何提供商能找到匹配结果。在运行前删除这些行以节省积分。Realistic coverage expectations
实际覆盖范围预期
Don't assume 100% fill rates. Actual coverage per provider:
| Data type | Single provider | 2-provider waterfall | 3-provider waterfall |
|---|---|---|---|
| ~50% | ~65% | ~75% | |
| Phone | ~30% | ~40% | ~45% |
| LinkedIn URL | ~65% | ~75% | ~85% |
Set expectations with stakeholders before running. Diminishing returns after 2-3 providers.
不要假设能达到100%的数据填充率。各提供商的实际覆盖范围如下:
| 数据类型 | 单一提供商 | 双提供商瀑布模式 | 三提供商瀑布模式 |
|---|---|---|---|
| 邮箱 | ~50% | ~65% | ~75% |
| 电话号码 | ~30% | ~40% | ~45% |
| LinkedIn URL | ~65% | ~75% | ~85% |
在运行前与相关人员沟通预期。使用2-3个提供商后,收益会逐渐递减。
Hard rules
硬性规则
- Always end with when the waterfall resolves email
leadmagic_email_validation - Use pilot before any full run
--rows 0:1 - Do not reuse an existing output CSV path
- Chain one waterfall at a time; close before starting another
--end-waterfall - Use canonical values:
--typeemail | phone | linkedin | first_name | last_name | full_name - Never call enrichment without minimum data: email waterfalls need name + company OR LinkedIn URL. Phone waterfalls need a verified email. Don't waste credits on rows missing required fields.
- 当瀑布模式解析出邮箱时,务必以收尾
leadmagic_email_validation - 在任何完整运行前,使用进行试点测试
--rows 0:1 - 不要复用已有的输出CSV路径
- 一次仅执行一个瀑布任务;在启动另一个之前必须用关闭当前任务
--end-waterfall - 使用标准的值:
--typeemail | phone | linkedin | first_name | last_name | full_name - 绝对不要在缺少最小必要数据的情况下调用数据丰富服务:邮箱瀑布模式需要“姓名+公司”或LinkedIn URL。电话号码瀑布模式需要已验证的邮箱。不要在缺少必填字段的行上浪费积分。
After a waterfall run
瀑布任务运行后
The Playground auto-opens for inspection:
bash
deepline playground start --csv leads.csv --openUse in the playground to re-run a single block for debugging.
--rows 0:1Playground会自动打开以供检查:
bash
deepline playground start --csv leads.csv --open在Playground中使用重新运行单个块以进行调试。
--rows 0:1Related skills
相关技能
This skill teaches the waterfall pattern. For specific enrichment tasks, use:
- Finding emails → (pre-built email waterfalls)
contact-to-email - Finding LinkedIn URLs → (with nickname expansion + validation)
linkedin-url-lookup - Finding contacts at companies →
get-leads-at-company - Building prospect lists →
build-tam
本技能介绍了瀑布模式。针对特定的数据丰富任务,请使用:
- 查找邮箱 → (预构建的邮箱瀑布模式)
contact-to-email - 查找LinkedIn URL → (支持昵称扩展+验证)
linkedin-url-lookup - 查找公司联系人 →
get-leads-at-company - 构建潜在客户列表 →
build-tam
Get started
开始使用
Sign up and get your API key at code.deepline.com.
bash
npm install -g @deepline/cli
deepline auth login在code.deepline.com注册并获取你的API密钥。
bash
npm install -g @deepline/cli
deepline auth login