data360-prepare
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesedata360-prepare: Data Cloud Prepare Phase
data360-prepare:Data Cloud 准备阶段
Use this skill when the user needs ingestion and lake preparation work: data streams, Data Lake Objects (DLOs), transforms, Document AI, unstructured ingestion, or the handoff from connector setup into a live stream.
当用户需要数据摄入与数据湖准备工作时使用此技能:数据流、Data Lake Objects(DLOs)、转换、Document AI、非结构化数据摄入,或从连接器设置切换至实时流的交接工作。
When This Skill Owns the Task
此技能负责的任务场景
Use when the work involves:
data360-preparesf data360 data-stream *sf data360 dlo *sf data360 transform *sf data360 docai *- choosing how data should enter Data Cloud
- rerunning or rescanning ingestion after a source update
- preparing Ingestion API-backed streams after connector setup is complete
Delegate elsewhere when the user is:
- still creating/testing source connections → data360-connect
- mapping to DMOs or designing IR/data graphs → data360-harmonize
- querying ingested data → data360-query
当工作涉及以下内容时,使用:
data360-preparesf data360 data-stream *sf data360 dlo *sf data360 transform *sf data360 docai *- 选择数据进入Data Cloud的方式
- 源更新后重新运行或重新扫描数据摄入
- 连接器设置完成后,准备基于Ingestion API的数据流
当用户进行以下操作时,转交至其他技能:
- 仍在创建/测试源连接 → data360-connect
- 映射至DMO或设计身份解析/数据图谱 → data360-harmonize
- 查询已摄入的数据 → data360-query
Required Context to Gather First
需先收集的必要上下文信息
Ask for or infer:
- target org alias
- source connection name
- source object / dataset / document source
- desired stream type
- DLO naming expectations
- whether the user is creating, updating, running, or deleting a stream
- whether the source is CRM, a database connector, an unstructured file source, or an Ingestion API feed
询问或推断:
- 目标组织别名
- 源连接名称
- 源对象/数据集/文档源
- 所需的流类型
- DLO命名规范
- 用户是要创建、更新、运行还是删除流
- 源是CRM、数据库连接器、非结构化文件源还是Ingestion API数据源
Core Operating Rules
核心操作规则
- Verify the external plugin runtime before running Data Cloud commands.
- Run the shared readiness classifier before mutating ingestion assets: .
node ../data360-orchestrate/scripts/diagnose-org.mjs -o <org> --phase prepare --json - Prefer inspecting existing streams and DLOs before creating new ingestion assets.
- Suppress linked-plugin warning noise with for normal usage.
2>/dev/null - Treat DLO naming and field naming as Data Cloud-specific, not CRM-native.
- Confirm whether each dataset should be treated as ,
Profile, orEngagementbefore creating the stream.Other - Distinguish stream-level refresh from connection-level reruns when working with unstructured sources.
- Use UI setup intentionally when initial stream or unstructured asset creation is platform-gated.
- Hand off to Harmonize only after ingestion assets are clearly healthy.
- 运行Data Cloud命令前,验证外部插件运行环境。
- 在修改摄入资产前,运行共享就绪度分类器:。
node ../data360-orchestrate/scripts/diagnose-org.mjs -o <org> --phase prepare --json - 创建新的摄入资产前,优先检查现有流和DLO。
- 常规使用时,通过抑制关联插件的警告信息。
2>/dev/null - 将DLO命名和字段命名视为Data Cloud专属规则,而非CRM原生规则。
- 创建流前,确认每个数据集应被归类为、
Profile还是Engagement。Other - 处理非结构化源时,区分流级刷新与连接级重新运行。
- 当初始流或非结构化资产创建受平台限制时,有意使用UI设置。
- 仅当摄入资产明确处于健康状态后,才转交至Harmonize技能。
Recommended Workflow
推荐工作流程
1. Classify readiness for prepare work
1. 分类准备工作的就绪度
bash
node ../data360-orchestrate/scripts/diagnose-org.mjs -o <org> --phase prepare --jsonbash
node ../data360-orchestrate/scripts/diagnose-org.mjs -o <org> --phase prepare --json2. Inspect existing ingestion assets
2. 检查现有摄入资产
bash
sf data360 data-stream list -o <org> 2>/dev/null
sf data360 dlo list -o <org> 2>/dev/nullbash
sf data360 data-stream list -o <org> 2>/dev/null
sf data360 dlo list -o <org> 2>/dev/null3. Confirm the stream category before creation
3. 创建前确认流类别
Use these rules when suggesting categories:
| Category | Use for | Typical requirement |
|---|---|---|
| person/entity records | primary key |
| time-based events or interactions | primary key + event time field |
| reference/configuration/supporting datasets | primary key |
When the source is ambiguous, ask the user explicitly whether the dataset should be treated as , , or .
ProfileEngagementOther建议类别时遵循以下规则:
| 类别 | 适用场景 | 典型要求 |
|---|---|---|
| 人员/实体记录 | 主键 |
| 基于时间的事件或交互 | 主键 + 事件时间字段 |
| 参考/配置/支持数据集 | 主键 |
当源不明确时,明确询问用户该数据集应被视为、还是。
ProfileEngagementOther4. Create or inspect streams intentionally
4. 有意创建或检查流
bash
sf data360 data-stream get -o <org> --name <stream> 2>/dev/null
sf data360 data-stream create-from-object -o <org> --object Contact --connection SalesforceDotCom_Home 2>/dev/null
sf data360 data-stream create -o <org> -f stream.json 2>/dev/null
sf data360 data-stream run -o <org> --name <stream> 2>/dev/nullbash
sf data360 data-stream get -o <org> --name <stream> 2>/dev/null
sf data360 data-stream create-from-object -o <org> --object Contact --connection SalesforceDotCom_Home 2>/dev/null
sf data360 data-stream create -o <org> -f stream.json 2>/dev/null
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null5. Check DLO shape
5. 检查DLO结构
bash
sf data360 dlo get -o <org> --name Contact_Home__dll 2>/dev/nullbash
sf data360 dlo get -o <org> --name Contact_Home__dll 2>/dev/null6. Choose the right refresh mechanism
6. 选择合适的刷新机制
Use the smaller refresh scope that matches the user goal:
bash
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null
sf data360 connection run-existing -o <org> --name <connection-id> 2>/dev/null- is the closest match to a stream-level refresh or re-scan.
data-stream run - runs at the connection level and can be useful for some connector workflows, but it is not a reliable replacement for stream refresh on unstructured sources.
connection run-existing - For unstructured document connectors, prefer when the goal is to re-scan newly added or changed files.
data-stream run
使用与用户目标匹配的最小刷新范围:
bash
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null
sf data360 connection run-existing -o <org> --name <connection-id> 2>/dev/null- 最接近流级刷新或重新扫描。
data-stream run - 在连接级别运行,对某些连接器工作流有用,但不能可靠替代非结构化源的流刷新。
connection run-existing - 对于非结构化文档连接器,当目标是重新扫描新增或修改的文件时,优先使用。
data-stream run
7. Handle unstructured sources deliberately
7. 谨慎处理非结构化源
For SharePoint-style document ingestion, a minimal unstructured DLO payload can look like:
json
{
"name": "my_udlo",
"label": "My UDLO",
"category": "Directory_Table",
"dataSource": {
"sourceType": "SF_DRIVE",
"directoryAndFilesDetails": [
{
"dirName": "SPUnstructuredDocument/<CONNECTION_ID>/<SITE_ID>",
"fileName": "*"
}
],
"sourceConfig": {
"reservedPrefix": "$dcf_content$"
}
}
}Use the UI for the first-time unstructured setup when the user needs the richer end-to-end pipeline. The UI path can seed additional document metadata fields and downstream assets that a bare CLI DLO create flow may not provision automatically.
对于SharePoint风格的文档摄入,最小化的非结构化DLO负载示例如下:
json
{
"name": "my_udlo",
"label": "My UDLO",
"category": "Directory_Table",
"dataSource": {
"sourceType": "SF_DRIVE",
"directoryAndFilesDetails": [
{
"dirName": "SPUnstructuredDocument/<CONNECTION_ID>/<SITE_ID>",
"fileName": "*"
}
],
"sourceConfig": {
"reservedPrefix": "$dcf_content$"
}
}
}当用户需要更完整的端到端管道时,首次非结构化设置使用UI。UI路径可以自动生成额外的文档元数据字段和下游资产,而纯CLI的DLO创建流程可能无法自动提供这些内容。
8. Use the local Ingestion API example for send-data workflows
8. 使用本地Ingestion API示例进行数据发送工作流
For external systems pushing records into Data Cloud:
- create the connector in data360-connect
- upload the schema with
sf data360 connection schema-upsert - create the stream in the UI when required
- send records with the local example in
examples/ingestion-api/
bash
cd examples/ingestion-api
cp .env.example .env
python3 send-data.pyKey details:
- auth is a staged flow: JWT → Salesforce token → Data Cloud token
- the ingestion endpoint uses the tenant URL, not the Salesforce instance URL
- means the payload was accepted for processing, not that records are queryable immediately
202 - validation failures often surface in the Problem Records DLO family
对于向Data Cloud推送记录的外部系统:
- 在data360-connect中创建连接器
- 使用上传 schema
sf data360 connection schema-upsert - 必要时在UI中创建流
- 使用中的本地示例发送记录
examples/ingestion-api/
bash
cd examples/ingestion-api
cp .env.example .env
python3 send-data.py关键细节:
- 认证是分阶段流程:JWT → Salesforce令牌 → Data Cloud令牌
- 摄入端点使用租户URL,而非Salesforce实例URL
- 表示负载已被接受处理,不代表记录可立即查询
202 - 验证失败通常会在Problem Records DLO家族中显示
9. Only then move into harmonization
9. 之后再进入协调阶段
Once the stream and DLO are healthy, hand off to data360-harmonize.
一旦流和DLO处于健康状态,转交至data360-harmonize。
High-Signal Gotchas
高风险注意事项
- CRM-backed stream behavior is not the same as fully custom connector-framework ingestion.
- and
sf data360 data-stream runare not interchangeable; prefer stream-level refresh for unstructured rescans.sf data360 connection run-existing - streams sync on a platform-managed schedule;
SFDCis not the general control path for CRM connector refresh.data-stream run - Some external database connectors can be created via API while stream creation still requires UI flow or org-specific browser automation. Do not promise a pure CLI stream-creation path for every connector type.
- Initial SharePoint-style unstructured setup can be richer in the UI than in a minimal CLI DLO create flow.
- Stream deletion can also delete the associated DLO unless the delete mode says otherwise.
- DLO field naming differs from CRM field naming, including →
__ctransformations._c - Query DLO record counts with Data Cloud SQL instead of assuming list output is sufficient.
- means the stream module is gated for the current org/user; guide the user to provisioning/permissions review instead of retrying blindly.
CdpDataStreams
- CRM支持的流行为与完全自定义的连接器框架摄入行为不同。
- 和
sf data360 data-stream run不可互换;非结构化重新扫描优先使用流级刷新。sf data360 connection run-existing - 流按平台管理的计划同步;
SFDC不是CRM连接器刷新的通用控制路径。data-stream run - 某些外部数据库连接器可通过API创建,但流创建仍需要UI流程或特定组织的浏览器自动化。不要承诺对所有连接器类型都提供纯CLI流创建路径。
- 首次SharePoint风格的非结构化设置在UI中比最小化CLI DLO创建流程更丰富。
- 流删除也可能删除关联的DLO,除非删除模式另有说明。
- DLO字段命名与CRM字段命名不同,包括→
__c的转换。_c - 使用Data Cloud SQL查询DLO记录数,不要假设列表输出足够。
- 表示当前组织/用户无法使用流模块;引导用户审核权限/资源配置,而非盲目重试。
CdpDataStreams
Output Format
输出格式
text
Prepare task: <stream / dlo / transform / docai>
Source: <connection + object>
Target org: <alias>
Artifacts: <stream names / dlo names / json definitions>
Verification: <passed / partial / blocked>
Next step: <harmonize or retrieve>text
Prepare task: <stream / dlo / transform / docai>
Source: <connection + object>
Target org: <alias>
Artifacts: <stream names / dlo names / json definitions>
Verification: <passed / partial / blocked>
Next step: <harmonize or retrieve>References
参考资料
- README.md
- examples/ingestion-api/README.md
- ../data360-orchestrate/assets/definitions/data-stream.template.json
- ../data360-orchestrate/references/plugin-setup.md
- ../data360-orchestrate/references/feature-readiness.md
- README.md
- examples/ingestion-api/README.md
- ../data360-orchestrate/assets/definitions/data-stream.template.json
- ../data360-orchestrate/references/plugin-setup.md
- ../data360-orchestrate/references/feature-readiness.md