waterfall-enrichment

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Waterfall Enrichment

瀑布式数据丰富

The waterfall pattern runs multiple enrichment providers in sequence and stops as soon as one returns a valid result. This maximizes coverage while minimizing cost — you only pay for lookups that actually run.

瀑布模式会按顺序运行多个数据丰富提供商，一旦其中一个返回有效结果就立即停止。这种模式在最大化数据覆盖范围的同时将成本降至最低——你只需为实际执行的查询付费。

Key concepts

核心概念

```
--with-waterfall <NAME>
```
— start a waterfall block named
```
<NAME>
```

--type

— what you're looking for:

email

phone

linkedin

first_name

last_name

full_name

```
--result-getters
```
— JSON path(s) where to find the value in provider output
```
--end-waterfall
```
— close the block; the waterfall name becomes a column with the resolved value
After
```
--end-waterfall
```
, use
```
{{<waterfall_name>}}
```
to reference the resolved scalar

```
--with-waterfall <NAME>
```
— 启动一个名为
```
<NAME>
```
的瀑布块

--type

— 你要查找的数据类型：

email

、

phone

、

linkedin

、

first_name

、

last_name

、

full_name

```
--result-getters
```
— 在提供商输出中查找值的JSON路径
```
--end-waterfall
```
— 关闭该块；瀑布名称会成为一个包含解析后值的列
在
```
--end-waterfall
```
之后，使用
```
{{<waterfall_name>}}
```
引用解析后的标量值

Always pilot first

务必先进行试点测试

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'provider_a=tool_name:{"param":"{{Column}}"}' \
  --with 'provider_b=tool_name:{"param":"{{Column}}"}' \
  --end-waterfall

Review output, then scale to

--rows 1:

for remaining rows.

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'provider_a=tool_name:{"param":"{{Column}}"}' \
  --with 'provider_b=tool_name:{"param":"{{Column}}"}' \
  --end-waterfall

查看输出结果，然后将

--rows 1:

用于剩余行以批量处理。

Phone waterfall

电话号码瀑布示例

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "phone" \
  --type phone \
  --result-getters '["data.phone","phone","mobile","data.mobile"]' \
  --with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "phone" \
  --type phone \
  --result-getters '["data.phone","phone","mobile","data.mobile"]' \
  --with 'mobile_finder=leadmagic_mobile_finder:{"email":"{{Email}}","first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

Email waterfall (name + company)

邮箱瀑布示例（姓名+公司）

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
  --with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
  --end-waterfall \
  --with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "email" \
  --type email \
  --result-getters '["data.email","email","data.0.email"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'crust_profile=crustdata_person_enrichment:{"linkedinProfileUrl":"{{LinkedIn}}","fields":["email","current_employers"],"enrichRealtime":true}' \
  --with 'pdl_enrich=peopledatalabs_enrich_contact:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","domain":"{{Company Domain}}"}' \
  --end-waterfall \
  --with 'email_validation=leadmagic_email_validation:{"email":"{{email}}"}'

LinkedIn URL waterfall

LinkedIn URL瀑布示例

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "linkedin" \
  --type linkedin \
  --result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

bash

deepline enrich --input leads.csv --in-place --rows 0:1 \
  --with-waterfall "linkedin" \
  --type linkedin \
  --result-getters '["linkedin_url","data.linkedin_url","data.0.linkedin_url"]' \
  --with 'apollo_match=apollo_people_match:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","organization_name":"{{Company}}"}' \
  --with 'pdl_identify=peopledatalabs_person_identify:{"first_name":"{{First Name}}","last_name":"{{Last Name}}","company":"{{Company}}"}' \
  --end-waterfall

Pre-flight validation

运行前验证

Before running any waterfall, validate your input data:

bash

undefined

在运行任何瀑布式任务之前，先验证你的输入数据：

bash

undefined

Check for required columns and empty values

检查必填列和空值

python3 -c " import csv, sys with open('leads.csv') as f: rows = list(csv.DictReader(f)) cols = rows[0].keys() if rows else [] print(f'Columns: {list(cols)}') print(f'Total rows: {len(rows)}')

Check for empty required fields

检查必填字段的空值情况

for col in ['First Name', 'Last Name', 'Company']: empty = sum(1 for r in rows if not r.get(col, '').strip()) if empty: print(f'WARNING: {empty} rows missing {col}')

Check for duplicates (same person appears multiple times)

检查重复项（同一联系人多次出现）

keys = [(r.get('First Name','').strip().lower(), r.get('Last Name','').strip().lower(), r.get('Company','').strip().lower()) for r in rows] from collections import Counter dupes = {k: v for k, v in Counter(keys).items() if v > 1} if dupes: print(f'WARNING: {len(dupes)} duplicate contacts — deduplicate before enrichment to avoid paying for the same lookup twice') "


**Skip rows that can't match:** If a row is missing both email AND name+company, no provider will find a match. Remove these before running to save credits.


**跳过无法匹配的行：** 如果某一行同时缺少邮箱和“姓名+公司”信息，没有任何提供商能找到匹配结果。在运行前删除这些行以节省积分。

Realistic coverage expectations

实际覆盖范围预期

Don't assume 100% fill rates. Actual coverage per provider:

Data type	Single provider	2-provider waterfall	3-provider waterfall
Email	~50%	~65%	~75%
Phone	~30%	~40%	~45%
LinkedIn URL	~65%	~75%	~85%

Set expectations with stakeholders before running. Diminishing returns after 2-3 providers.

不要假设能达到100%的数据填充率。各提供商的实际覆盖范围如下：

数据类型	单一提供商	双提供商瀑布模式	三提供商瀑布模式
邮箱	~50%	~65%	~75%
电话号码	~30%	~40%	~45%
LinkedIn URL	~65%	~75%	~85%

在运行前与相关人员沟通预期。使用2-3个提供商后，收益会逐渐递减。

Hard rules

硬性规则

Always end with
```
leadmagic_email_validation
```
when the waterfall resolves email
Use
```
--rows 0:1
```
pilot before any full run
Do not reuse an existing output CSV path
Chain one waterfall at a time; close
```
--end-waterfall
```
before starting another

Use canonical

--type

values:

email | phone | linkedin | first_name | last_name | full_name

Never call enrichment without minimum data: email waterfalls need name + company OR LinkedIn URL. Phone waterfalls need a verified email. Don't waste credits on rows missing required fields.

当瀑布模式解析出邮箱时，务必以
```
leadmagic_email_validation
```
收尾
在任何完整运行前，使用
```
--rows 0:1
```
进行试点测试
不要复用已有的输出CSV路径
一次仅执行一个瀑布任务；在启动另一个之前必须用
```
--end-waterfall
```
关闭当前任务

使用标准的

--type

值：

email | phone | linkedin | first_name | last_name | full_name

绝对不要在缺少最小必要数据的情况下调用数据丰富服务：邮箱瀑布模式需要“姓名+公司”或LinkedIn URL。电话号码瀑布模式需要已验证的邮箱。不要在缺少必填字段的行上浪费积分。

After a waterfall run

瀑布任务运行后

The Playground auto-opens for inspection:

bash

deepline playground start --csv leads.csv --open

Use

--rows 0:1

in the playground to re-run a single block for debugging.

Playground会自动打开以供检查：

bash

deepline playground start --csv leads.csv --open

在Playground中使用

--rows 0:1

重新运行单个块以进行调试。

Related skills

Get started

开始使用

bash

npm install -g @deepline/cli
deepline auth login

在code.deepline.com注册并获取你的API密钥。

bash

npm install -g @deepline/cli
deepline auth login

waterfall-enrichment

Original

Translation

Waterfall Enrichment

瀑布式数据丰富

Key concepts

核心概念

Always pilot first

务必先进行试点测试

Phone waterfall

电话号码瀑布示例

Email waterfall (name + company)

邮箱瀑布示例（姓名+公司）

LinkedIn URL waterfall

LinkedIn URL瀑布示例

Pre-flight validation

运行前验证

Check for required columns and empty values

检查必填列和空值

Check for empty required fields

检查必填字段的空值情况

Check for duplicates (same person appears multiple times)

检查重复项（同一联系人多次出现）

Realistic coverage expectations

实际覆盖范围预期

Hard rules

硬性规则

After a waterfall run

瀑布任务运行后

Related skills

相关技能

Get started

开始使用