preparing-datacloud

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

preparing-datacloud: Data Cloud Prepare Phase

preparing-datacloud：Data Cloud 准备阶段

Use this skill when the user needs ingestion and lake preparation work: data streams, Data Lake Objects (DLOs), transforms, Document AI, unstructured ingestion, or the handoff from connector setup into a live stream.

当用户需要数据摄入和数据湖准备工作时使用此技能：数据流、数据湖对象（DLO）、转换、Document AI、非结构化数据摄入，或从连接器设置切换到实时流的环节。

When This Skill Owns the Task

此技能负责的任务场景

Use

preparing-datacloud

when the work involves:

```
sf data360 data-stream *
```
```
sf data360 dlo *
```
```
sf data360 transform *
```
```
sf data360 docai *
```
choosing how data should enter Data Cloud
rerunning or rescanning ingestion after a source update
preparing Ingestion API-backed streams after connector setup is complete

Delegate elsewhere when the user is:

still creating/testing source connections → connecting-datacloud
mapping to DMOs or designing IR/data graphs → harmonizing-datacloud
querying ingested data → retrieving-datacloud

当工作涉及以下内容时，使用

preparing-datacloud

：

```
sf data360 data-stream *
```
```
sf data360 dlo *
```
```
sf data360 transform *
```
```
sf data360 docai *
```
选择数据进入Data Cloud的方式
源更新后重新运行或重新扫描数据摄入
连接器设置完成后准备基于Ingestion API的流

当用户进行以下操作时，转交至其他技能：

仍在创建/测试源连接 → connecting-datacloud
映射到DMO或设计IR/数据图谱 → harmonizing-datacloud
查询已摄入的数据 → retrieving-datacloud

Required Context to Gather First

首先需要收集的必要上下文

Ask for or infer:

target org alias
source connection name
source object / dataset / document source
desired stream type
DLO naming expectations
whether the user is creating, updating, running, or deleting a stream
whether the source is CRM, a database connector, an unstructured file source, or an Ingestion API feed

询问或推断：

目标组织别名
源连接名称
源对象/数据集/文档源
所需的流类型
DLO命名规范
用户是要创建、更新、运行还是删除流
源是CRM、数据库连接器、非结构化文件源还是Ingestion API馈送

Core Operating Rules

核心操作规则

Verify the external plugin runtime before running Data Cloud commands.

Run the shared readiness classifier before mutating ingestion assets:

node ~/.claude/skills/orchestrating-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json

Prefer inspecting existing streams and DLOs before creating new ingestion assets.
Suppress linked-plugin warning noise with
```
2>/dev/null
```
for normal usage.
Treat DLO naming and field naming as Data Cloud-specific, not CRM-native.
Confirm whether each dataset should be treated as
```
Profile
```
,
```
Engagement
```
, or
```
Other
```
before creating the stream.
Distinguish stream-level refresh from connection-level reruns when working with unstructured sources.
Use UI setup intentionally when initial stream or unstructured asset creation is platform-gated.
Hand off to Harmonize only after ingestion assets are clearly healthy.

运行Data Cloud命令前，验证外部插件运行时环境。

修改摄入资产前，运行共享就绪分类器：

node ~/.claude/skills/orchestrating-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json

。

创建新的摄入资产前，优先检查现有流和DLO。
正常使用时，通过
```
2>/dev/null
```
抑制链接插件的警告信息。
将DLO命名和字段命名视为Data Cloud特定规则，而非CRM原生规则。
创建流前，确认每个数据集应被视为
```
Profile
```
、
```
Engagement
```
还是
```
Other
```
。
处理非结构化源时，区分流级刷新和连接级重新运行。
当初始流或非结构化资产创建受平台限制时，有意使用UI设置。
仅当摄入资产明确健康后，才转交至Harmonize阶段。

Recommended Workflow

推荐工作流程

1. Classify readiness for prepare work

1. 分类准备工作的就绪状态

bash

node ~/.claude/skills/orchestrating-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json

bash

node ~/.claude/skills/orchestrating-datacloud/scripts/diagnose-org.mjs -o <org> --phase prepare --json

2. Inspect existing ingestion assets

2. 检查现有摄入资产

bash

sf data360 data-stream list -o <org> 2>/dev/null
sf data360 dlo list -o <org> 2>/dev/null

bash

sf data360 data-stream list -o <org> 2>/dev/null
sf data360 dlo list -o <org> 2>/dev/null

3. Confirm the stream category before creation

3. 创建前确认流类别

Use these rules when suggesting categories:

Category	Use for	Typical requirement
`Profile`	person/entity records	primary key
`Engagement`	time-based events or interactions	primary key + event time field
`Other`	reference/configuration/supporting datasets	primary key

When the source is ambiguous, ask the user explicitly whether the dataset should be treated as

Profile

Engagement

, or

Other

建议类别时遵循以下规则：

类别	适用场景	典型要求
`Profile`	人员/实体记录	primary key
`Engagement`	基于时间的事件或交互	primary key + event time field
`Other`	参考/配置/支持数据集	primary key

当源不明确时，明确询问用户该数据集应被视为

Profile

、

Engagement

还是

Other

。

4. Create or inspect streams intentionally

4. 有意创建或检查流

bash

sf data360 data-stream get -o <org> --name <stream> 2>/dev/null
sf data360 data-stream create-from-object -o <org> --object Contact --connection SalesforceDotCom_Home 2>/dev/null
sf data360 data-stream create -o <org> -f stream.json 2>/dev/null
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null

bash

sf data360 data-stream get -o <org> --name <stream> 2>/dev/null
sf data360 data-stream create-from-object -o <org> --object Contact --connection SalesforceDotCom_Home 2>/dev/null
sf data360 data-stream create -o <org> -f stream.json 2>/dev/null
sf data360 data-stream run -o <org> --name <stream> 2>/dev/null

5. Check DLO shape

5. 检查DLO结构

bash

sf data360 dlo get -o <org> --name Contact_Home__dll 2>/dev/null

bash

sf data360 dlo get -o <org> --name Contact_Home__dll 2>/dev/null

6. Choose the right refresh mechanism

6. 选择合适的刷新机制

Use the smaller refresh scope that matches the user goal:

bash

sf data360 data-stream run -o <org> --name <stream> 2>/dev/null
sf data360 connection run-existing -o <org> --name <connection-id> 2>/dev/null

```
data-stream run
```
is the closest match to a stream-level refresh or re-scan.
```
connection run-existing
```
runs at the connection level and can be useful for some connector workflows, but it is not a reliable replacement for stream refresh on unstructured sources.
For unstructured document connectors, prefer
```
data-stream run
```
when the goal is to re-scan newly added or changed files.

使用与用户目标匹配的最小刷新范围：

bash

sf data360 data-stream run -o <org> --name <stream> 2>/dev/null
sf data360 connection run-existing -o <org> --name <connection-id> 2>/dev/null

```
data-stream run
```
最接近流级刷新或重新扫描。
```
connection run-existing
```
在连接级别运行，对某些连接器工作流有用，但不能可靠替代非结构化源的流刷新。
对于非结构化文档连接器，当目标是重新扫描新增或修改的文件时，优先使用
```
data-stream run
```
。

7. Handle unstructured sources deliberately

7. 谨慎处理非结构化源

For SharePoint-style document ingestion, a minimal unstructured DLO payload can look like:

json

{
  "name": "my_udlo",
  "label": "My UDLO",
  "category": "Directory_Table",
  "dataSource": {
    "sourceType": "SF_DRIVE",
    "directoryAndFilesDetails": [
      {
        "dirName": "SPUnstructuredDocument/<CONNECTION_ID>/<SITE_ID>",
        "fileName": "*"
      }
    ],
    "sourceConfig": {
      "reservedPrefix": "$dcf_content$"
    }
  }
}

Use the UI for the first-time unstructured setup when the user needs the richer end-to-end pipeline. The UI path can seed additional document metadata fields and downstream assets that a bare CLI DLO create flow may not provision automatically.

对于SharePoint风格的文档摄入，最小化的非结构化DLO负载如下所示：

json

{
  "name": "my_udlo",
  "label": "My UDLO",
  "category": "Directory_Table",
  "dataSource": {
    "sourceType": "SF_DRIVE",
    "directoryAndFilesDetails": [
      {
        "dirName": "SPUnstructuredDocument/<CONNECTION_ID>/<SITE_ID>",
        "fileName": "*"
      }
    ],
    "sourceConfig": {
      "reservedPrefix": "$dcf_content$"
    }
  }
}

当用户需要更完整的端到端管道时，首次非结构化设置使用UI。UI路径可以生成额外的文档元数据字段和下游资产，而纯CLI的DLO创建流程可能不会自动配置这些内容。

8. Use the local Ingestion API example for send-data workflows

8. 针对发送数据工作流使用本地Ingestion API示例

For external systems pushing records into Data Cloud:

create the connector in connecting-datacloud
upload the schema with
```
sf data360 connection schema-upsert
```
create the stream in the UI when required
send records with the local example in
```
examples/ingestion-api/
```

bash

cd examples/ingestion-api
cp .env.example .env
python3 send-data.py

Key details:

auth is a staged flow: JWT → Salesforce token → Data Cloud token
the ingestion endpoint uses the tenant URL, not the Salesforce instance URL
```
202
```
means the payload was accepted for processing, not that records are queryable immediately
validation failures often surface in the Problem Records DLO family

对于将记录推送到Data Cloud的外部系统：

在connecting-datacloud中创建连接器
使用
```
sf data360 connection schema-upsert
```
上传 schema
必要时在UI中创建流
使用
```
examples/ingestion-api/
```
中的本地示例发送记录

bash

cd examples/ingestion-api
cp .env.example .env
python3 send-data.py

关键细节：

认证是分阶段流程：JWT → Salesforce令牌 → Data Cloud令牌
摄入端点使用租户URL，而非Salesforce实例URL
```
202
```
表示负载已被接受处理，不代表记录可立即查询
验证失败通常会出现在Problem Records DLO系列中

9. Only then move into harmonization

9. 之后再进入协调阶段

Once the stream and DLO are healthy, hand off to harmonizing-datacloud.

一旦流和DLO状态健康，转交至harmonizing-datacloud。

High-Signal Gotchas

高信号注意事项

CRM-backed stream behavior is not the same as fully custom connector-framework ingestion.
```
sf data360 data-stream run
```
and
```
sf data360 connection run-existing
```
are not interchangeable; prefer stream-level refresh for unstructured rescans.
```
SFDC
```
streams sync on a platform-managed schedule;
```
data-stream run
```
is not the general control path for CRM connector refresh.
Some external database connectors can be created via API while stream creation still requires UI flow or org-specific browser automation. Do not promise a pure CLI stream-creation path for every connector type.
Initial SharePoint-style unstructured setup can be richer in the UI than in a minimal CLI DLO create flow.
Stream deletion can also delete the associated DLO unless the delete mode says otherwise.
DLO field naming differs from CRM field naming, including
```
__c
```
→
```
_c
```
transformations.
Query DLO record counts with Data Cloud SQL instead of assuming list output is sufficient.
```
CdpDataStreams
```
means the stream module is gated for the current org/user; guide the user to provisioning/permissions review instead of retrying blindly.

CRM支持的流行为与完全自定义的连接器框架摄入不同。
```
sf data360 data-stream run
```
和
```
sf data360 connection run-existing
```
不可互换；非结构化重新扫描优先使用流级刷新。
```
SFDC
```
流按平台管理的计划同步；
```
data-stream run
```
不是CRM连接器刷新的通用控制路径。
某些外部数据库连接器可通过API创建，但流创建仍需UI流程或特定组织的浏览器自动化。不要承诺对所有连接器类型都提供纯CLI的流创建路径。
首次SharePoint风格的非结构化设置在UI中比最小化的CLI DLO创建流程更丰富。
流删除也可能删除关联的DLO，除非删除模式另有说明。
DLO字段命名与CRM字段命名不同，包括
```
__c
```
→
```
_c
```
的转换。
使用Data Cloud SQL查询DLO记录数，而非假设列表输出足够。
```
CdpDataStreams
```
表示当前组织/用户无法使用流模块；引导用户审查权限/配置，而非盲目重试。

Output Format

输出格式

text

Prepare task: <stream / dlo / transform / docai>
Source: <connection + object>
Target org: <alias>
Artifacts: <stream names / dlo names / json definitions>
Verification: <passed / partial / blocked>
Next step: <harmonize or retrieve>

text

Prepare task: <stream / dlo / transform / docai>
Source: <connection + object>
Target org: <alias>
Artifacts: <stream names / dlo names / json definitions>
Verification: <passed / partial / blocked>
Next step: <harmonize or retrieve>

References

参考资料

README.md
examples/ingestion-api/README.md
../orchestrating-datacloud/assets/definitions/data-stream.template.json
../orchestrating-datacloud/references/plugin-setup.md
../orchestrating-datacloud/references/feature-readiness.md

README.md
examples/ingestion-api/README.md
../orchestrating-datacloud/assets/definitions/data-stream.template.json
../orchestrating-datacloud/references/plugin-setup.md
../orchestrating-datacloud/references/feature-readiness.md