data-enrichment
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePrereq: read first — JSONL piping, dry-run/digest, history, and rate-limit hygiene live there. This skill is the upsert-by-natural-key workflow on top.
bulk-operations/SKILL.md前置要求:请先阅读——其中包含JSONL管道、试运行/摘要、历史记录以及速率限制规范等内容。本技能是基于自然键的upsert工作流的上层实现。
bulk-operations/SKILL.mdThe core move: upsert, not search-then-create
核心操作:使用upsert,而非先搜索再创建
hubspot objects upsert --type X --id-property <natural-key>searchcreatePer line in:
Per line out: or . Order matches input.
{"id":"jane@example.com","properties":{"firstname":"Jane","jobtitle":"VP"}}{"id":"123","ok":true,"data":{...,"new":true|false}}{"ok":false,"error":{...}}hubspot objects upsert --type X --id-property <natural-key>searchcreate输入每行格式:
输出每行格式: 或 。输出顺序与输入一致。
{"id":"jane@example.com","properties":{"firstname":"Jane","jobtitle":"VP"}}{"id":"123","ok":true,"data":{...,"new":true|false}}{"ok":false,"error":{...}}CSV/JSONL → upsert stream
CSV/JSONL → upsert数据流
Reshape with , preview with , then execute. Always lowercase the natural key — CRM match is exact. Confirm available property names with ; never hard-code a list. See for reshape idioms.
jq--dry-runhubspot properties list --type contactsbulk-operations/resources/json-patterns.mdbash
undefined使用重塑数据,通过预览,然后执行。请务必将自然键转为小写——CRM的匹配是精确匹配。可通过确认可用的属性名称;切勿硬编码属性列表。有关数据重塑的惯用写法,请参阅。
jq--dry-runhubspot properties list --type contactsbulk-operations/resources/json-patterns.mdbash
undefinedCSV → JSONL (any tool); example using csvkit
CSV → JSONL (any tool); example using csvkit
csvjson external.csv | jq -c '.[]' > external.jsonl
csvjson external.csv | jq -c '.[]' > external.jsonl
Preview
Preview
cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head
cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email --dry-run | head
Execute (same pipeline, drop --dry-run, capture results)
Execute (same pipeline, drop --dry-run, capture results)
cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl
Companies: swap `--type companies --id-property domain` and reshape with `.domain|ascii_downcase` as `id`.cat external.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl
| jq -c '{id:(.email|ascii_downcase), properties:{firstname:.first, lastname:.last, jobtitle:.title, company:.company}}'
| hubspot objects upsert --type contacts --id-property email
| tee /tmp/upsert.results.jsonl
针对公司数据:替换为`--type companies --id-property domain`,并将`.domain|ascii_downcase`作为`id`进行数据重塑。Handle per-record OK / error output
处理单条记录的成功/错误输出
Split with , inspect failure modes, retry just the failures after fixing the inputs:
jqbash
jq -c 'select(.ok==true)' /tmp/upsert.results.jsonl > /tmp/upsert.ok.jsonl
jq -c 'select(.ok==false)' /tmp/upsert.results.jsonl > /tmp/upsert.failed.jsonl
jq -r '.error.status' /tmp/upsert.failed.jsonl | sort | uniq -c # status → count
jq -r '.data.new' /tmp/upsert.ok.jsonl | sort | uniq -c # created vs updated429s: split the input and rerun smaller chunks (see rate-limit notes). 400s usually mean a bad property name or invalid enum value — fix the reshape, rerun the failed inputs.
bulk-operations使用拆分结果,检查失败原因,修复输入后仅重试失败的记录:
jqbash
jq -c 'select(.ok==true)' /tmp/upsert.results.jsonl > /tmp/upsert.ok.jsonl
jq -c 'select(.ok==false)' /tmp/upsert.results.jsonl > /tmp/upsert.failed.jsonl
jq -r '.error.status' /tmp/upsert.failed.jsonl | sort | uniq -c # status → count
jq -r '.data.new' /tmp/upsert.ok.jsonl | sort | uniq -c # created vs updated429错误:拆分输入数据,以更小的批次重新运行(请参阅中的速率限制说明)。400错误通常意味着属性名称错误或枚举值无效——修复数据重塑逻辑,重新运行失败的输入。
bulk-operationsDestructive-op safety
破坏性操作安全规范
upsert--dry-runbulk-operations/SKILL.mdhubspot history --since 1hupsert--dry-runbulk-operations/SKILL.mdhubspot history --since 1hMatch without upsert: OR-search → update
仅匹配不执行upsert:或搜索→更新
When you only want to read matches (no write-back), or the natural key isn't a CRM property, use repeated flags — each flag is one OR group.
--filterVerified cap: 5 OR groups per call. 6+ returns . Chunk 5 at a time:
400 too many filterGroups (count: N, max allowed: 5)bash
undefined当您仅需读取匹配结果(无需回写),或者自然键并非CRM属性时,请使用重复的标志——每个标志代表一个或组。
--filter已验证上限:每次调用最多5个或组。超过6个会返回。请按5个一组拆分:
400 too many filterGroups (count: N, max allowed: 5)bash
undefinedemails.txt: one lowercased email per line
emails.txt: one lowercased email per line
xargs -n5 < emails.txt | while read -r e1 e2 e3 e4 e5; do
args=()
for e in "$e1" "$e2" "$e3" "$e4" "$e5"; do [ -n "$e" ] && args+=(--filter "email=$e"); done
hubspot objects search --type contacts "${args[@]}" --properties email,firstname,company
done > /tmp/matches.jsonl
jq -c '{id, properties:{lifecyclestage:"marketingqualifiedlead"}}' /tmp/matches.jsonl
| hubspot objects update --type contacts --dry-run
| hubspot objects update --type contacts --dry-run
For larger keyed enrichments, prefer `upsert` — one pipeline, no chunking math.xargs -n5 < emails.txt | while read -r e1 e2 e3 e4 e5; do
args=()
for e in "$e1" "$e2" "$e3" "$e4" "$e5"; do [ -n "$e" ] && args+=(--filter "email=$e"); done
hubspot objects search --type contacts "${args[@]}" --properties email,firstname,company
done > /tmp/matches.jsonl
jq -c '{id, properties:{lifecyclestage:"marketingqualifiedlead"}}' /tmp/matches.jsonl
| hubspot objects update --type contacts --dry-run
| hubspot objects update --type contacts --dry-run
对于大规模的键值增强操作,建议优先使用`upsert`——只需一个管道,无需计算拆分批次。