waterfall-enrichment

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Waterfall Enrichment

瀑布式数据丰富

The waterfall pattern runs multiple enrichment providers in sequence and stops as soon as one returns a valid result. This maximizes coverage while minimizing cost — you only pay for lookups that actually run.
瀑布模式会按顺序运行多个数据丰富提供商,一旦其中一个返回有效结果就立即停止。这种模式在最大化数据覆盖范围的同时将成本降至最低——你只需为实际执行的查询付费。

Key concepts

核心概念

  • --with-waterfall <NAME>
    — start a waterfall block named
    <NAME>
  • --type
    — what you're looking for:
    email
    ,
    phone
    ,
    linkedin
    ,
    first_name
    ,
    last_name
    ,
    full_name
  • --result-getters
    — JSON path(s) where to find the value in provider output
  • --end-waterfall
    — close the block; the waterfall name becomes a column with the resolved value
  • After
    --end-waterfall
    , use
    {{<waterfall_name>}}
    to reference the resolved scalar
  • --with-waterfall <NAME>
    — 启动一个名为
    <NAME>
    的瀑布块
  • --type
    — 你要查找的数据类型:
    email
    phone
    linkedin
    first_name
    last_name
    full_name
  • --result-getters
    — 在提供商输出中查找值的JSON路径
  • --end-waterfall
    — 关闭该块;瀑布名称会成为一个包含解析后值的列
  • --end-waterfall
    之后,使用
    {{<waterfall_name>}}
    引用解析后的标量值

Always pilot first

务必先进行试点测试

bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'provider_a=tool_name:{"param":"{{Column}}"}' \
  --with 'provider_b=tool_name:{"param":"{{Column}}"}' \
  --end-waterfall
Review output, then scale to
--rows 1:
for remaining rows.
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'provider_a=tool_name:{"param":"{{Column}}"}' \
  --with 'provider_b=tool_name:{"param":"{{Column}}"}' \
  --end-waterfall
查看输出结果,然后将
--rows 1:
用于剩余行以批量处理。

Phone waterfall

电话号码瀑布示例

bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "phone" \
  --type phone \
  --result-getters '["data.phone","phone","mobile","data.mobile"]' \
  --with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "phone" \
  --type phone \
  --result-getters '["data.phone","phone","mobile","data.mobile"]' \
  --with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

Email waterfall (name + company)

邮箱瀑布示例(姓名+公司)

bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
  --with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
  --end-waterfall \
  --with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
  --with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
  --end-waterfall \
  --with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'

LinkedIn URL waterfall

LinkedIn URL瀑布示例

bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "linkedin" \
  --type linkedin \
  --result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall
bash
deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "linkedin" \
  --type linkedin \
  --result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

Pre-flight validation

运行前验证

Before running any waterfall, validate your input data:
bash
undefined
在运行任何瀑布式任务之前,先验证你的输入数据:
bash
undefined

Check for required columns and empty values

检查必填列和空值

python3 -c " import csv, sys with open('leads.csv') as f: rows = list(csv.DictReader(f)) cols = rows[0].keys() if rows else [] print(f'Columns: {list(cols)}') print(f'Total rows: {len(rows)}')
python3 -c " import csv, sys with open('leads.csv') as f: rows = list(csv.DictReader(f)) cols = rows[0].keys() if rows else [] print(f'Columns: {list(cols)}') print(f'Total rows: {len(rows)}')

Check for empty required fields

检查必填字段的空值情况

for col in ['First Name', 'Last Name', 'Company']: empty = sum(1 for r in rows if not r.get(col, '').strip()) if empty: print(f'WARNING: {empty} rows missing {col}')
for col in ['First Name', 'Last Name', 'Company']: empty = sum(1 for r in rows if not r.get(col, '').strip()) if empty: print(f'WARNING: {empty} rows missing {col}')

Check for duplicates (same person appears multiple times)

检查重复项(同一联系人多次出现)

keys = [(r.get('First Name','').strip().lower(), r.get('Last Name','').strip().lower(), r.get('Company','').strip().lower()) for r in rows] from collections import Counter dupes = {k: v for k, v in Counter(keys).items() if v > 1} if dupes: print(f'WARNING: {len(dupes)} duplicate contacts — deduplicate before enrichment to avoid paying for the same lookup twice') "

**Skip rows that can't match:** If a row is missing both email AND name+company, no provider will find a match. Remove these before running to save credits.
keys = [(r.get('First Name','').strip().lower(), r.get('Last Name','').strip().lower(), r.get('Company','').strip().lower()) for r in rows] from collections import Counter dupes = {k: v for k, v in Counter(keys).items() if v > 1} if dupes: print(f'WARNING: {len(dupes)} duplicate contacts — deduplicate before enrichment to avoid paying for the same lookup twice') "

**跳过无法匹配的行:** 如果某一行同时缺少邮箱和“姓名+公司”信息,没有任何提供商能找到匹配结果。在运行前删除这些行以节省积分。

Realistic coverage expectations

实际覆盖范围预期

Don't assume 100% fill rates. Actual coverage per provider:
Data typeSingle provider2-provider waterfall3-provider waterfall
Email~50%~65%~75%
Phone~30%~40%~45%
LinkedIn URL~65%~75%~85%
Set expectations with stakeholders before running. Diminishing returns after 2-3 providers.
不要假设能达到100%的数据填充率。各提供商的实际覆盖范围如下:
数据类型单一提供商双提供商瀑布模式三提供商瀑布模式
邮箱~50%~65%~75%
电话号码~30%~40%~45%
LinkedIn URL~65%~75%~85%
在运行前与相关人员沟通预期。使用2-3个提供商后,收益会逐渐递减。

Hard rules

硬性规则

  • Always end with
    leadmagic_email_validation
    when the waterfall resolves email
  • Use
    --rows 0:1
    pilot before any full run
  • Do not reuse an existing output CSV path
  • Chain one waterfall at a time; close
    --end-waterfall
    before starting another
  • Use canonical
    --type
    values:
    email | phone | linkedin | first_name | last_name | full_name
  • Never call enrichment without minimum data: email waterfalls need name + company OR LinkedIn URL. Phone waterfalls need a verified email. Don't waste credits on rows missing required fields.
  • 当瀑布模式解析出邮箱时,务必以
    leadmagic_email_validation
    收尾
  • 在任何完整运行前,使用
    --rows 0:1
    进行试点测试
  • 不要复用已有的输出CSV路径
  • 一次仅执行一个瀑布任务;在启动另一个之前必须用
    --end-waterfall
    关闭当前任务
  • 使用标准的
    --type
    值:
    email | phone | linkedin | first_name | last_name | full_name
  • 绝对不要在缺少最小必要数据的情况下调用数据丰富服务:邮箱瀑布模式需要“姓名+公司”或LinkedIn URL。电话号码瀑布模式需要已验证的邮箱。不要在缺少必填字段的行上浪费积分。

After a waterfall run

瀑布任务运行后

The Playground auto-opens for inspection:
bash
deepline playground start --csv leads.csv --open
Use
--rows 0:1
in the playground to re-run a single block for debugging.
Playground会自动打开以供检查:
bash
deepline playground start --csv leads.csv --open
在Playground中使用
--rows 0:1
重新运行单个块以进行调试。

Related skills

相关技能

This skill teaches the waterfall pattern. For specific enrichment tasks, use:
  • Finding emails
    contact-to-email
    (pre-built email waterfalls)
  • Finding LinkedIn URLs
    linkedin-url-lookup
    (with nickname expansion + validation)
  • Finding contacts at companies
    get-leads-at-company
  • Building prospect lists
    build-tam
本技能介绍了瀑布模式。针对特定的数据丰富任务,请使用:
  • 查找邮箱
    contact-to-email
    (预构建的邮箱瀑布模式)
  • 查找LinkedIn URL
    linkedin-url-lookup
    (支持昵称扩展+验证)
  • 查找公司联系人
    get-leads-at-company
  • 构建潜在客户列表
    build-tam

Get started

开始使用

Sign up and get your API key at code.deepline.com.
bash
npm install -g @deepline/cli
deepline auth login
code.deepline.com注册并获取你的API密钥。
bash
npm install -g @deepline/cli
deepline auth login