sap-hana-cloud-data-intelligence

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

SAP HANA Cloud Data Intelligence Skill

SAP HANA Cloud Data Intelligence 技能

This skill provides comprehensive guidance for developing with SAP Data Intelligence Cloud, including pipeline creation, operator development, data integration, and machine learning scenarios.

本技能为SAP Data Intelligence Cloud的开发提供全面指导，包括管道创建、算子开发、数据集成和机器学习场景构建。

When to Use This Skill

适用场景

Use this skill when:

Creating or modifying data processing graphs/pipelines
Developing custom operators (Gen1 or Gen2)
Integrating ABAP-based SAP systems (S/4HANA, BW)
Building replication flows for data movement
Developing ML scenarios with ML Scenario Manager
Working with JupyterLab in Data Intelligence
Using Data Transformation Language (DTL) functions
Configuring subengines (Python, Node.js, C++)
Working with structured data operators

在以下场景中使用本技能：

创建或修改数据处理图形/管道
开发自定义算子（Gen1或Gen2）
集成基于ABAP的SAP系统（S/4HANA、BW）
构建用于数据迁移的复制流
通过ML Scenario Manager开发机器学习场景
在Data Intelligence中使用JupyterLab
使用Data Transformation Language (DTL)函数
配置子引擎（Python、Node.js、C++）
使用结构化数据算子

Core Concepts

核心概念

Graphs (Pipelines)

图形（管道）

Graphs are networks of operators connected via typed input/output ports for data transfer.

Two Generations:

Gen1 Operators: Legacy operators, broad compatibility
Gen2 Operators: Enhanced error recovery, state management, snapshots

Critical Rule: Graphs cannot mix Gen1 and Gen2 operators - choose one generation per graph.

Gen2 Advantages:

Automatic error recovery with snapshots
State management with periodic checkpoints
Native multiplexing (one-to-many, many-to-one)
Improved Python3 operator

图形是通过类型化输入/输出端口连接的算子网络，用于数据传输。

两代算子：

Gen1 Operators：传统算子，兼容性广泛
Gen2 Operators：增强的错误恢复、状态管理和快照功能

关键规则：图形不能混合使用Gen1和Gen2算子——每个图形只能选择一代算子。

Gen2优势：

借助快照实现自动错误恢复
通过定期检查点实现状态管理
原生多路复用（一对多、多对一）
改进的Python3算子

Operators

算子

Building blocks that process data within graphs. Each operator has:

Ports: Typed input/output connections for data flow
Configuration: Parameters that control behavior
Runtime: Engine that executes the operator

Operator Categories:

Messaging (Kafka, MQTT, NATS)
Storage (Files, HDFS, S3, Azure, GCS)
Database (HANA, SAP BW, SQL)
Script (Python, JavaScript, R, Go)
Data Processing (Transform, Anonymize, Validate)
Machine Learning (TensorFlow, PyTorch, HANA ML)
Integration (OData, REST, SAP CPI)
Workflow (Pipeline, Data Workflow)

图形中用于处理数据的构建块。每个算子包含：

端口：用于数据流的类型化输入/输出连接
配置：控制行为的参数
运行时：执行算子的引擎

算子分类：

消息类（Kafka、MQTT、NATS）
存储类（Files、HDFS、S3、Azure、GCS）
数据库类（HANA、SAP BW、SQL）
脚本类（Python、JavaScript、R、Go）
数据处理类（转换、匿名化、验证）
机器学习类（TensorFlow、PyTorch、HANA ML）
集成类（OData、REST、SAP CPI）
工作流类（管道、数据工作流）

Subengines

子引擎

Subengines enable operators to run on different runtimes within the same graph.

Supported Subengines:

ABAP: For ABAP Pipeline Engine operators
Python 3.9: For Python-based operators
Node.js: For JavaScript-based operators
C++: For high-performance native operators

Key Benefit: Connected operators on the same subengine run in a single OS process for optimal performance.

Trade-off: Cross-engine communication requires serialization/deserialization overhead.

子引擎允许算子在同一图形中运行于不同的运行时环境。

支持的子引擎：

ABAP：用于ABAP Pipeline Engine算子
Python 3.9：用于基于Python的算子
Node.js：用于基于JavaScript的算子
C++：用于高性能原生算子

核心优势：同一子引擎上的关联算子在单个操作系统进程中运行，实现最优性能。

权衡点：跨引擎通信需要序列化/反序列化开销。

Quick Start Patterns

快速入门模式

Basic Graph Creation

基础图形创建

1. Open SAP Data Intelligence Modeler
2. Create new graph
3. Add operators from repository
4. Connect operator ports (matching types)
5. Configure operator parameters
6. Validate graph
7. Execute and monitor

1. 打开SAP Data Intelligence Modeler
2. 创建新图形
3. 从存储库添加算子
4. 连接算子端口（匹配类型）
5. 配置算子参数
6. 验证图形
7. 执行并监控

Replication Flow Pattern

复制流模式

1. Create replication flow in Modeler
2. Configure source connection (ABAP, HANA, etc.)
3. Configure target (HANA Cloud, S3, Kafka, etc.)
4. Add tasks with source objects
5. Define filters and mappings
6. Validate flow
7. Deploy to tenant repository
8. Run and monitor

Delivery Guarantees:

Default: At-least-once (may have duplicates)
With UPSERT to databases: Exactly-once
For cloud storage: Use "Suppress Duplicates" option

1. 在Modeler中创建复制流
2. 配置源连接（ABAP、HANA等）
3. 配置目标（HANA Cloud、S3、Kafka等）
4. 添加包含源对象的任务
5. 定义过滤器和映射
6. 验证流
7. 部署到租户存储库
8. 运行并监控

交付保障：

默认：至少一次（可能存在重复数据）
配合数据库UPSERT操作：恰好一次
云存储场景：使用"Suppress Duplicates"选项

ML Scenario Pattern

机器学习场景模式

1. Open ML Scenario Manager from launchpad
2. Create new scenario
3. Add datasets (register data sources)
4. Create Jupyter notebooks for experiments
5. Build training pipelines
6. Track metrics with Metrics Explorer
7. Version scenario for reproducibility
8. Deploy model pipeline

1. 从启动板打开ML Scenario Manager
2. 创建新场景
3. 添加数据集（注册数据源）
4. 创建Jupyter笔记本用于实验
5. 构建训练管道
6. 使用Metrics Explorer跟踪指标
7. 为场景创建版本以确保可复现性
8. 部署模型管道

Common Tasks

常见任务

ABAP System Integration

ABAP系统集成

For integrating ABAP-based SAP systems:

Prerequisites: Configure Cloud Connector for on-premise systems
Connection Setup: Create ABAP connection in Connection Management
Metadata Access: Use Metadata Explorer for object discovery
Data Sources: CDS Views, ODP (Operational Data Provisioning), Tables

Reference: See

references/abap-integration.md

for detailed setup.

集成基于ABAP的SAP系统步骤：

前提条件：为本地系统配置Cloud Connector
连接设置：在Connection Management中创建ABAP连接
元数据访问：使用Metadata Explorer发现对象
数据源：CDS视图、ODP（Operational Data Provisioning）、表

参考：详见

references/abap-integration.md

获取详细设置说明。

Structured Data Processing

结构化数据处理

Use structured data operators for SQL-like transformations:

Data Transform: Visual SQL editor for complex transformations
Aggregation Node: GROUP BY with aggregation functions
Join Node: INNER, LEFT, RIGHT, FULL joins
Projection Node: Column selection and renaming
Union Node: Combine multiple datasets
Case Node: Conditional logic

Reference: See

references/structured-data-operators.md

for configuration.

使用结构化数据算子实现类SQL转换：

Data Transform：用于复杂转换的可视化SQL编辑器
Aggregation Node：带聚合函数的GROUP BY操作
Join Node：INNER、LEFT、RIGHT、FULL连接
Projection Node：列选择和重命名
Union Node：合并多个数据集
Case Node：条件逻辑

参考：详见

references/structured-data-operators.md

获取配置说明。

Data Transformation Language

DTL provides SQL-like functions for data processing:

Function Categories:

String: CONCAT, SUBSTRING, UPPER, LOWER, TRIM, REPLACE
Numeric: ABS, CEIL, FLOOR, ROUND, MOD, POWER
Date/Time: ADD_DAYS, MONTHS_BETWEEN, EXTRACT, CURRENT_UTCTIMESTAMP
Conversion: TO_DATE, TO_STRING, TO_INTEGER, TO_DECIMAL
Miscellaneous: CASE, COALESCE, IFNULL, NULLIF

Reference: See

references/dtl-functions.md

for complete reference.

DTL提供类SQL函数用于数据处理：

函数分类：

字符串类：CONCAT、SUBSTRING、UPPER、LOWER、TRIM、REPLACE
数值类：ABS、CEIL、FLOOR、ROUND、MOD、POWER
日期/时间类：ADD_DAYS、MONTHS_BETWEEN、EXTRACT、CURRENT_UTCTIMESTAMP
转换类：TO_DATE、TO_STRING、TO_INTEGER、TO_DECIMAL
其他类：CASE、COALESCE、IFNULL、NULLIF

参考：详见

references/dtl-functions.md

获取完整参考。

Best Practices

最佳实践

Graph Design

图形设计

Choose Generation Early: Decide Gen1 vs Gen2 before building
Minimize Cross-Engine Communication: Group operators by subengine
Use Appropriate Port Types: Match data types for efficient transfer
Enable Snapshots: For Gen2 graphs, enable auto-recovery
Validate Before Execution: Always validate graphs

尽早选择算子代际：构建前确定使用Gen1还是Gen2
减少跨引擎通信：按子引擎分组算子
使用合适的端口类型：匹配数据类型以实现高效传输
启用快照：对于Gen2图形，启用自动恢复功能
执行前验证：始终先验证图形

Operator Development

算子开发

Start with Built-in Operators: Use predefined operators first
Extend When Needed: Create custom operators for specific needs
Use Script Operators: For quick prototyping with Python/JS
Version Your Operators: Track changes with operator versions
Document Configuration: Describe all parameters

优先使用内置算子：先使用预定义算子
按需扩展：为特定需求创建自定义算子
使用脚本算子：通过Python/JS快速原型开发
版本化算子：通过算子版本跟踪变更
文档化配置：描述所有参数

Replication Flows

复制流

Plan Target Schema: Understand target structure requirements
Use Filters: Reduce data volume with source filters
Handle Duplicates: Configure for exactly-once when possible
Monitor Execution: Track progress and errors
Clean Up Artifacts: Remove source artifacts after completion

规划目标架构：了解目标结构要求
使用过滤器：通过源过滤器减少数据量
处理重复数据：尽可能配置为恰好一次交付
监控执行：跟踪进度和错误
清理工件：完成后移除源工件

ML Scenarios

机器学习场景

Version Early: Create versions before major changes
Track All Metrics: Use SDK for comprehensive tracking
Use Notebooks for Exploration: JupyterLab for experimentation
Productionize with Pipelines: Convert notebooks to pipelines
Export/Import for Migration: Use ZIP export for transfers

尽早版本化：在重大变更前创建版本
跟踪所有指标：使用SDK进行全面跟踪
使用笔记本进行探索：通过JupyterLab开展实验
通过管道实现生产化：将笔记本转换为管道
导出/导入用于迁移：使用ZIP导出进行传输

Error Handling

错误处理

Common Graph Errors

常见图形错误

Error	Cause	Solution
Port type mismatch	Incompatible data types	Use converter operator or matching types
Gen1/Gen2 mixing	Combined operator generations	Use single generation per graph
Resource exhaustion	Insufficient memory/CPU	Adjust resource requirements
Connection failure	Network or credentials	Verify connection settings
Validation errors	Invalid configuration	Review error messages, fix config

错误类型	原因	解决方法
端口类型不匹配	数据类型不兼容	使用转换器算子或匹配类型
Gen1/Gen2混合使用	算子代际混合	每个图形仅使用单一代际算子
资源耗尽	内存/CPU不足	调整资源需求
连接失败	网络或凭证问题	验证连接设置
验证错误	配置无效	查看错误消息，修复配置

Recovery Strategies

恢复策略

Gen2 Graphs:

Enable automatic recovery in graph settings
Configure snapshot intervals
Monitor recovery status

Gen1 Graphs:

Implement manual error handling in operators
Use try-catch in script operators
Configure retry logic

Gen2图形：

在图形设置中启用自动恢复
配置快照间隔
监控恢复状态

Gen1图形：

在算子中实现手动错误处理
在脚本算子中使用try-catch
配置重试逻辑

Reference Files

参考文件

For detailed information, see:

```
references/operators-reference.md
```
- Complete operator catalog (266 operators)
```
references/abap-integration.md
```
- ABAP/S4HANA/BW integration with SAP Notes
```
references/structured-data-operators.md
```
- Structured data processing
```
references/dtl-functions.md
```
- Data Transformation Language (79 functions)
```
references/ml-scenario-manager.md
```
- ML Scenario Manager, SDK, artifacts
```
references/subengines.md
```
- Python, Node.js, C++ subengine development
```
references/graphs-pipelines.md
```
- Graph execution, snapshots, recovery
```
references/replication-flows.md
```
- Replication flows, cloud storage, Kafka
```
references/data-workflow.md
```
- Data workflow operators, orchestration
```
references/security-cdc.md
```
- Security, data protection, CDC methods
```
references/additional-features.md
```
- Monitoring, cloud storage services, scenario templates, data types, Git terminal
```
references/modeling-advanced.md
```
- Graph snippets, SAP cloud apps, configuration types, 141 graph templates

如需详细信息，请参阅：

```
references/operators-reference.md
```
- 完整算子目录（266个算子）
```
references/abap-integration.md
```
- ABAP/S4HANA/BW集成及SAP说明
```
references/structured-data-operators.md
```
- 结构化数据处理
```
references/dtl-functions.md
```
- Data Transformation Language（79个函数）
```
references/ml-scenario-manager.md
```
- ML Scenario Manager、SDK、工件
```
references/subengines.md
```
- Python、Node.js、C++子引擎开发
```
references/graphs-pipelines.md
```
- 图形执行、快照、恢复
```
references/replication-flows.md
```
- 复制流、云存储、Kafka
```
references/data-workflow.md
```
- 数据工作流算子、编排
```
references/security-cdc.md
```
- 安全、数据保护、CDC方法
```
references/additional-features.md
```
- 监控、云存储服务、场景模板、数据类型、Git终端
```
references/modeling-advanced.md
```
- 图形片段、SAP云应用、配置类型、141个图形模板

Templates

模板

Starter templates are available in

templates/

```
templates/basic-graph.json
```
- Simple data processing graph
```
templates/replication-flow.json
```
- Data replication pattern
```
templates/ml-training-pipeline.json
```
- ML training workflow

启动模板位于

templates/

目录下：

```
templates/basic-graph.json
```
- 简单数据处理图形
```
templates/replication-flow.json
```
- 数据复制模式
```
templates/ml-training-pipeline.json
```
- 机器学习训练工作流

Documentation Links

文档链接

Primary Sources:

GitHub Docs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs
SAP Help Portal: https://help.sap.com/docs/SAP_DATA_INTELLIGENCE
SAP Developer Center: https://developers.sap.com/topics/data-intelligence.html

Section-Specific:

Modeling Guide: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide

主要来源：

GitHub Docs: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs
SAP Help Portal: https://help.sap.com/docs/SAP_DATA_INTELLIGENCE
SAP Developer Center: https://developers.sap.com/topics/data-intelligence.html

按分类：

建模指南: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/modelingguide

Bundled Resources

配套资源

Reference Documentation

参考文档

```
references/abap-integration.md
```
- ABAP system integration guide
```
references/ml-scenario-manager.md
```
- Machine Learning scenario manager
```
references/replication-flows.md
```
- Data replication flow configuration
```
references/operators-reference.md
```
- Complete operators reference
```
references/dtl-functions.md
```
- Data Transformation Language functions
```
references/modeling-advanced.md
```
- Advanced modeling techniques
```
references/structured-data-operators.md
```
- Structured data operators guide

```
references/abap-integration.md
```
- ABAP系统集成指南
```
references/ml-scenario-manager.md
```
- 机器学习场景管理器
```
references/replication-flows.md
```
- 数据复制流配置
```
references/operators-reference.md
```
- 完整算子参考
```
references/dtl-functions.md
```
- Data Transformation Language函数
```
references/modeling-advanced.md
```
- 高级建模技术
```
references/structured-data-operators.md
```
- 结构化数据算子指南

Documentation Links

文档链接

ABAP Integration: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/abapintegration
Machine Learning: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/machinelearning
Function Reference: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/functionreference
Repository Objects: https://github.com/SAP-docs/sap-hana-cloud-data-intelligence/tree/main/docs/repositoryobjects

Version Information

版本信息

Skill Version: 1.0.0
Last Updated: 2025-11-27
Documentation Source: SAP-docs/sap-hana-cloud-data-intelligence (GitHub)

技能版本: 1.0.0
最后更新: 2025-11-27
文档来源: SAP-docs/sap-hana-cloud-data-intelligence (GitHub)

sap-hana-cloud-data-intelligence

Original

Translation

SAP HANA Cloud Data Intelligence Skill

SAP HANA Cloud Data Intelligence 技能

Table of Contents

目录

When to Use This Skill

适用场景

Core Concepts

核心概念

Graphs (Pipelines)

图形（管道）

Operators

算子

Subengines

子引擎

Quick Start Patterns

快速入门模式

Basic Graph Creation

基础图形创建

Replication Flow Pattern

复制流模式

ML Scenario Pattern

机器学习场景模式

Common Tasks

常见任务

ABAP System Integration

ABAP系统集成

Structured Data Processing

结构化数据处理

Data Transformation Language

Data Transformation Language

Best Practices

最佳实践

Graph Design

图形设计

Operator Development

算子开发

Replication Flows

复制流

ML Scenarios

机器学习场景

Error Handling

错误处理

Common Graph Errors

常见图形错误

Recovery Strategies

恢复策略

Reference Files

参考文件

Templates

模板

Documentation Links

文档链接

Bundled Resources

配套资源

Reference Documentation

参考文档

Documentation Links

文档链接

Version Information

版本信息