adf-validation-rules

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

🚨 CRITICAL GUIDELINES

🚨 关键指南

Windows File Path Requirements

Windows文件路径要求

MANDATORY: Always Use Backslashes on Windows for File Paths

When using Edit or Write tools on Windows, you MUST use backslashes (

) in file paths, NOT forward slashes (

Examples:

❌ WRONG:
```
D:/repos/project/file.tsx
```
✅ CORRECT:
```
D:\repos\project\file.tsx
```

This applies to:

Edit tool file_path parameter
Write tool file_path parameter
All file operations on Windows systems

强制要求：Windows系统下文件路径始终使用反斜杠

在Windows系统上使用编辑或写入工具时，文件路径必须使用反斜杠（

），绝对不能使用正斜杠（

）。

示例：

❌ 错误：
```
D:/repos/project/file.tsx
```
✅ 正确：
```
D:\repos\project\file.tsx
```

此要求适用于：

编辑工具的file_path参数
写入工具的file_path参数
Windows系统上的所有文件操作

Documentation Guidelines

文档指南

NEVER create new documentation files unless explicitly requested by the user.

Priority: Update existing README.md files rather than creating new documentation
Repository cleanliness: Keep repository root clean - only README.md unless user requests otherwise
Style: Documentation should be concise, direct, and professional - avoid AI-generated tone
User preference: Only create additional .md files when user specifically asks for documentation

除非用户明确要求，否则绝对不要创建新的文档文件。

优先级：优先更新现有README.md文件，而非创建新文档
仓库整洁性：保持仓库根目录整洁 - 除非用户要求，否则只保留README.md
风格：文档应简洁、直接、专业 - 避免AI生成的冗余语气
用户偏好：仅在用户明确要求文档时才创建额外的.md文件

Azure Data Factory Validation Rules and Limitations

Azure Data Factory验证规则与限制

🚨 CRITICAL: Activity Nesting Limitations

🚨 关键：活动嵌套限制

Azure Data Factory has STRICT nesting rules for control flow activities. Violating these rules will cause pipeline failures or prevent pipeline creation.

Azure Data Factory对控制流活动有严格的嵌套规则。违反这些规则会导致管道失败或无法创建管道。

Supported Control Flow Activities for Nesting

支持嵌套的控制流活动

Four control flow activities support nested activities:

ForEach: Iterates over collections and executes activities in a loop
If Condition: Branches based on true/false evaluation
Until: Implements do-until loops with timeout options
Switch: Evaluates activities matching case conditions

有四种控制流活动支持嵌套活动：

ForEach：遍历集合并循环执行活动
If Condition：基于真假判断分支执行
Until：实现带超时选项的do-until循环
Switch：根据匹配的条件执行活动

✅ PERMITTED Nesting Combinations

✅ 允许的嵌套组合

Parent Activity	Can Contain	Notes
ForEach	If Condition	✅ Allowed
ForEach	Switch	✅ Allowed
Until	If Condition	✅ Allowed
Until	Switch	✅ Allowed

父活动	可包含的活动	说明
ForEach	If Condition	✅ 允许
ForEach	Switch	✅ 允许
Until	If Condition	✅ 允许
Until	Switch	✅ 允许

❌ PROHIBITED Nesting Combinations

❌ 禁止的嵌套组合

Parent Activity	CANNOT Contain	Reason
If Condition	ForEach	❌ Not supported - use Execute Pipeline workaround
If Condition	Switch	❌ Not supported - use Execute Pipeline workaround
If Condition	Until	❌ Not supported - use Execute Pipeline workaround
If Condition	Another If	❌ Cannot nest If within If
Switch	ForEach	❌ Not supported - use Execute Pipeline workaround
Switch	If Condition	❌ Not supported - use Execute Pipeline workaround
Switch	Until	❌ Not supported - use Execute Pipeline workaround
Switch	Another Switch	❌ Cannot nest Switch within Switch
ForEach	Another ForEach	❌ Single level only - use Execute Pipeline workaround
Until	Another Until	❌ Single level only - use Execute Pipeline workaround
ForEach	Until	❌ Single level only - use Execute Pipeline workaround
Until	ForEach	❌ Single level only - use Execute Pipeline workaround

父活动	不可包含的活动	原因
If Condition	ForEach	❌ 不支持 - 使用Execute Pipeline替代方案
If Condition	Switch	❌ 不支持 - 使用Execute Pipeline替代方案
If Condition	Until	❌ 不支持 - 使用Execute Pipeline替代方案
If Condition	另一个If	❌ 不能在If中嵌套If
Switch	ForEach	❌ 不支持 - 使用Execute Pipeline替代方案
Switch	If Condition	❌ 不支持 - 使用Execute Pipeline替代方案
Switch	Until	❌ 不支持 - 使用Execute Pipeline替代方案
Switch	另一个Switch	❌ 不能在Switch中嵌套Switch
ForEach	另一个ForEach	❌ 仅支持单层级 - 使用Execute Pipeline替代方案
Until	另一个Until	❌ 仅支持单层级 - 使用Execute Pipeline替代方案
ForEach	Until	❌ 仅支持单层级 - 使用Execute Pipeline替代方案
Until	ForEach	❌ 仅支持单层级 - 使用Execute Pipeline替代方案

🚫 Special Activity Restrictions

🚫 特殊活动限制

Validation Activity:

❌ CANNOT be placed inside ANY nested activity
❌ CANNOT be used within ForEach, If, Switch, or Until activities
✅ Must be at pipeline root level only

Validation Activity（验证活动）：

❌ 绝对不能放置在任何嵌套活动内部
❌ 绝对不能在ForEach、If、Switch或Until活动中使用
✅ 必须仅位于管道根层级

🔧 Workaround: Execute Pipeline Pattern

🔧 替代方案：Execute Pipeline模式

The ONLY supported workaround for prohibited nesting combinations:

Instead of direct nesting, use the Execute Pipeline Activity to call a child pipeline:

json

{
  "name": "ParentPipeline_WithIfCondition",
  "activities": [
    {
      "name": "IfCondition_Parent",
      "type": "IfCondition",
      "typeProperties": {
        "expression": "@equals(pipeline().parameters.ProcessData, 'true')",
        "ifTrueActivities": [
          {
            "name": "ExecuteChildPipeline_WithForEach",
            "type": "ExecutePipeline",
            "typeProperties": {
              "pipeline": {
                "referenceName": "ChildPipeline_ForEachLoop",
                "type": "PipelineReference"
              },
              "parameters": {
                "ItemList": "@pipeline().parameters.Items"
              }
            }
          }
        ]
      }
    }
  ]
}

Child Pipeline Structure:

json

{
  "name": "ChildPipeline_ForEachLoop",
  "parameters": {
    "ItemList": {"type": "array"}
  },
  "activities": [
    {
      "name": "ForEach_InChildPipeline",
      "type": "ForEach",
      "typeProperties": {
        "items": "@pipeline().parameters.ItemList",
        "activities": [
          // Your ForEach logic here
        ]
      }
    }
  ]
}

Why This Works:

Each pipeline can have ONE level of nesting
Execute Pipeline creates a new pipeline context
Child pipeline gets its own nesting level allowance
Enables unlimited depth through pipeline chaining

针对禁止嵌套组合的唯一支持的替代方案：

不要直接嵌套，而是使用Execute Pipeline Activity调用子管道：

json

{
  "name": "ParentPipeline_WithIfCondition",
  "activities": [
    {
      "name": "IfCondition_Parent",
      "type": "IfCondition",
      "typeProperties": {
        "expression": "@equals(pipeline().parameters.ProcessData, 'true')",
        "ifTrueActivities": [
          {
            "name": "ExecuteChildPipeline_WithForEach",
            "type": "ExecutePipeline",
            "typeProperties": {
              "pipeline": {
                "referenceName": "ChildPipeline_ForEachLoop",
                "type": "PipelineReference"
              },
              "parameters": {
                "ItemList": "@pipeline().parameters.Items"
              }
            }
          }
        ]
      }
    }
  ]
}

子管道结构：

json

{
  "name": "ChildPipeline_ForEachLoop",
  "parameters": {
    "ItemList": {"type": "array"}
  },
  "activities": [
    {
      "name": "ForEach_InChildPipeline",
      "type": "ForEach",
      "typeProperties": {
        "items": "@pipeline().parameters.ItemList",
        "activities": [
          // 此处添加你的ForEach逻辑
        ]
      }
    }
  ]
}

为什么此方案可行：

每个管道可以有一层嵌套
Execute Pipeline会创建新的管道上下文
子管道拥有自己的嵌套层级配额
通过管道链式调用实现无限深度

🔢 Activity and Resource Limits

🔢 活动与资源限制

Pipeline Limits

管道限制

Resource	Limit	Notes
Activities per pipeline	80	Includes inner activities for containers
Parameters per pipeline	50	-
ForEach concurrent iterations	50 (maximum)	Set via `batchCount` property
ForEach items	100,000	-
Lookup activity rows	5,000	Maximum rows returned
Lookup activity size	4 MB	Maximum size of returned data
Web activity timeout	1 hour	Default timeout for Web activities
Copy activity timeout	7 days	Maximum execution time

资源	限制	说明
每个管道的活动数量	80	包含容器的内部活动
每个管道的参数数量	50	-
ForEach并发迭代数	50（最大值）	通过 `batchCount` 属性设置
ForEach项数量	100,000	-
Lookup活动返回行数	5,000	返回的最大行数
Lookup活动返回数据大小	4 MB	返回数据的最大大小
Web活动超时时间	1小时	Web活动的默认超时时间
Copy活动超时时间	7天	最大执行时间

ForEach Activity Configuration

ForEach活动配置

json

{
  "name": "ForEachActivity",
  "type": "ForEach",
  "typeProperties": {
    "items": "@pipeline().parameters.ItemList",
    "isSequential": false,  // false = parallel execution
    "batchCount": 50,       // Max 50 concurrent iterations
    "activities": [
      // Nested activities
    ]
  }
}

Critical Considerations:

```
isSequential: true
```
→ Executes one item at a time (slow but predictable)
```
isSequential: false
```
→ Executes up to
```
batchCount
```
items in parallel
Maximum
```
batchCount
```
is 50 regardless of setting
Cannot use Set Variable activity inside parallel ForEach (variable scope is pipeline-level)

json

{
  "name": "ForEachActivity",
  "type": "ForEach",
  "typeProperties": {
    "items": "@pipeline().parameters.ItemList",
    "isSequential": false,  // false = 并行执行
    "batchCount": 50,       // 最大50个并发迭代
    "activities": [
      // 嵌套活动
    ]
  }
}

关键注意事项：

```
isSequential: true
```
→ 逐个执行项（速度慢但可预测）
```
isSequential: false
```
→ 最多并行执行
```
batchCount
```
个项
无论设置如何，
```
batchCount
```
的最大值为50
不能在并行ForEach中使用Set Variable活动（变量作用域为管道层级）

Set Variable Activity Limitations

Set Variable活动限制

❌ CANNOT use

Set Variable

inside ForEach with

isSequential: false

Reason: Variables are pipeline-scoped, not ForEach-scoped
Multiple parallel iterations would cause race conditions
✅ Alternative: Use
```
Append Variable
```
with array type, or use sequential execution

❌ 不能在

isSequential: false

的ForEach中使用

Set Variable

原因：变量是管道作用域，而非ForEach作用域
多个并行迭代会导致竞争条件
✅ 替代方案：使用数组类型的
```
Append Variable
```
，或使用顺序执行

📊 Linked Services: Azure Blob Storage

📊 链接服务：Azure Blob存储

Authentication Methods

身份验证方法

1. Account Key (Basic)

1. 账户密钥（基础）

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "connectionString": {
      "type": "SecureString",
      "value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>"
    }
  }
}

⚠️ Limitations:

Secondary Blob service endpoints are NOT supported
Security Risk: Account keys should be stored in Azure Key Vault

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "connectionString": {
      "type": "SecureString",
      "value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>"
    }
  }
}

⚠️ 限制：

不支持Blob服务二级端点
安全风险：账户密钥应存储在Azure Key Vault中

2. Shared Access Signature (SAS)

2. 共享访问签名（SAS）

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "sasUri": {
      "type": "SecureString",
      "value": "https://<account>.blob.core.windows.net/<container>?<SAS-token>"
    }
  }
}

Critical Requirements:

Dataset
```
folderPath
```
must be absolute path from container level
SAS token expiry must extend beyond pipeline execution
SAS URI path must align with dataset configuration

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "sasUri": {
      "type": "SecureString",
      "value": "https://<account>.blob.core.windows.net/<container>?<SAS-token>"
    }
  }
}

关键要求：

数据集
```
folderPath
```
必须是从容器层级开始的绝对路径
SAS令牌的有效期必须覆盖管道执行时间
SAS URI路径必须与数据集配置一致

3. Service Principal

3. 服务主体

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2",  // REQUIRED for service principal
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}

Critical Requirements:

```
accountKind
```
MUST be set (StorageV2, BlobStorage, or BlockBlobStorage)
Service Principal requires Storage Blob Data Reader (source) or Storage Blob Data Contributor (sink) role
❌ NOT compatible with soft-deleted blob accounts in Data Flow

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2",  // 服务主体必需
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}

关键要求：

必须设置
```
accountKind
```
（StorageV2、BlobStorage或BlockBlobStorage）
服务主体需要Storage Blob Data Reader（源）或Storage Blob Data Contributor（接收器）角色
❌ 与数据流中的软删除Blob账户不兼容

4. Managed Identity (Recommended)

4. 托管标识（推荐）

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2"  // REQUIRED for managed identity
  },
  "connectVia": {
    "referenceName": "AutoResolveIntegrationRuntime",
    "type": "IntegrationRuntimeReference"
  }
}

Critical Requirements:

```
accountKind
```
MUST be specified (cannot be empty or "Storage")
❌ Empty or "Storage" account kind will cause Data Flow failures
Managed identity must have Storage Blob Data Reader/Contributor role assigned
For Storage firewall: Must enable "Allow trusted Microsoft services"

json

{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2"  // 托管标识必需
  },
  "connectVia": {
    "referenceName": "AutoResolveIntegrationRuntime",
    "type": "IntegrationRuntimeReference"
  }
}

关键要求：

必须指定
```
accountKind
```
（不能为空或"Storage"）
❌ 空值或"Storage"账户类型会导致数据流失败
托管标识必须被分配Storage Blob Data Reader/Contributor角色
对于存储防火墙：必须启用"允许受信任的Microsoft服务"

Common Blob Storage Pitfalls

Blob存储常见陷阱

Issue	Cause	Solution
Data Flow fails with managed identity	`accountKind` empty or "Storage"	Set `accountKind` to StorageV2
Secondary endpoint doesn't work	Using account key auth	Not supported - use different auth method
SAS token expired during run	Token expiry too short	Extend SAS token validity period
Cannot access $logs container	System container not visible in UI	Use direct path reference
Soft-deleted blobs inaccessible	Service principal/managed identity	Use account key or SAS instead
Private endpoint connection fails	Wrong endpoint for Data Flow	Ensure ADLS Gen2 private endpoint exists

问题	原因	解决方案
托管标识下数据流失败	`accountKind` 为空或"Storage"	将 `accountKind` 设置为StorageV2
二级端点无法工作	使用账户密钥身份验证	不支持 - 使用其他身份验证方法
运行期间SAS令牌过期	令牌有效期过短	延长SAS令牌的有效期
无法访问$logs容器	系统容器在UI中不可见	使用直接路径引用
软删除的Blob无法访问	使用服务主体/托管标识	改用账户密钥或SAS
专用端点连接失败	数据流的端点错误	确保ADLS Gen2专用端点存在

📊 Linked Services: Azure SQL Database

📊 链接服务：Azure SQL数据库

Authentication Methods

身份验证方法

1. SQL Authentication

1. SQL身份验证

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SQL",
    "userName": "<username>",
    "password": {
      "type": "SecureString",
      "value": "<password>"
    }
  }
}

Best Practice:

Store password in Azure Key Vault
Use connection string with Key Vault reference

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SQL",
    "userName": "<username>",
    "password": {
      "type": "SecureString",
      "value": "<password>"
    }
  }
}

最佳实践：

将密码存储在Azure Key Vault中
使用包含Key Vault引用的连接字符串

2. Service Principal

2. 服务主体

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "ServicePrincipal",
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}

Requirements:

Microsoft Entra admin must be configured on SQL server
Service principal must have contained database user created
Grant appropriate role:
```
db_datareader
```
,
```
db_datawriter
```
, etc.

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "ServicePrincipal",
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}

要求：

SQL服务器必须配置Microsoft Entra管理员
必须为服务主体创建包含数据库用户
授予适当的角色：
```
db_datareader
```
、
```
db_datawriter
```
等

3. Managed Identity

3. 托管标识

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SystemAssignedManagedIdentity"
  }
}

Requirements:

Create contained database user for managed identity
Grant appropriate database roles
Configure firewall to allow Azure services (or specific IP ranges)

json

{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SystemAssignedManagedIdentity"
  }
}

要求：

为托管标识创建包含数据库用户
授予适当的数据库角色
配置防火墙以允许Azure服务（或特定IP范围）

SQL Database Configuration Best Practices

SQL数据库配置最佳实践

Connection String Parameters

连接字符串参数

Server=tcp:<server>.database.windows.net,1433;
Database=<database>;
Encrypt=mandatory;          // Options: mandatory, optional, strict
TrustServerCertificate=false;
ConnectTimeout=30;
CommandTimeout=120;
Pooling=true;
ConnectRetryCount=3;
ConnectRetryInterval=10;

Critical Parameters:

```
Encrypt
```
: Default is
```
mandatory
```
(recommended)
```
Pooling
```
: Set to
```
false
```
if experiencing idle connection issues
```
ConnectRetryCount
```
: Recommended for transient fault handling
```
ConnectRetryInterval
```
: Seconds between retries

Server=tcp:<server>.database.windows.net,1433;
Database=<database>;
Encrypt=mandatory;          // 选项：mandatory, optional, strict
TrustServerCertificate=false;
ConnectTimeout=30;
CommandTimeout=120;
Pooling=true;
ConnectRetryCount=3;
ConnectRetryInterval=10;

关键参数：

```
Encrypt
```
：默认值为
```
mandatory
```
（推荐）
```
Pooling
```
：如果遇到空闲连接问题，设置为
```
false
```
```
ConnectRetryCount
```
：推荐用于临时故障处理
```
ConnectRetryInterval
```
：重试间隔（秒）

Common SQL Database Pitfalls

SQL数据库常见陷阱

Issue	Cause	Solution
Serverless tier auto-paused	Pipeline doesn't wait for resume	Implement retry logic or keep-alive
Connection pool timeout	Idle connections closed	Add `Pooling=false` or configure retry
Firewall blocks connection	IP not whitelisted	Add Azure IR IPs or enable Azure services
Always Encrypted fails in Data Flow	Not supported for sink	Use service principal/managed identity in copy activity
Decimal precision loss	Copy supports up to 28 precision	Use string type for higher precision
Parallel copy not working	No partition configuration	Enable physical or dynamic range partitioning

问题	原因	解决方案
无服务器层自动暂停	管道未等待恢复	实现重试逻辑或保持活动连接
连接池超时	空闲连接被关闭	添加 `Pooling=false` 或配置重试
防火墙阻止连接	IP未被列入白名单	添加Azure IR IP或启用Azure服务
数据流中始终加密失败	接收器不支持	在复制活动中使用服务主体/托管标识
小数精度丢失	复制活动最多支持28位精度	对更高精度使用字符串类型
并行复制不工作	未配置分区	启用物理或动态范围分区

Performance Optimization

性能优化

Parallel Copy Configuration

并行复制配置

json

{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"  // or "DynamicRange"
  },
  "parallelCopies": 8,  // Recommended: (DIU or IR nodes) × (2 to 4)
  "enableStaging": true,
  "stagingSettings": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorage",
      "type": "LinkedServiceReference"
    }
  }
}

Partition Options:

```
PhysicalPartitionsOfTable
```
: Uses SQL Server physical partitions
```
DynamicRange
```
: Creates logical partitions based on column values
```
None
```
: No partitioning (default)

Staging Best Practices:

Always use staging for large data movements (> 1GB)
Use PolyBase or COPY statement for best performance
Parquet format recommended for staging files

json

{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"  // 或 "DynamicRange"
  },
  "parallelCopies": 8,  // 推荐：(DIU或IR节点) × (2至4)
  "enableStaging": true,
  "stagingSettings": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorage",
      "type": "LinkedServiceReference"
    }
  }
}

分区选项：

```
PhysicalPartitionsOfTable
```
：使用SQL Server物理分区
```
DynamicRange
```
：基于列值创建逻辑分区
```
None
```
：无分区（默认）

暂存最佳实践：

对于大数据移动（>1GB），始终使用暂存
使用PolyBase或COPY语句以获得最佳性能
推荐使用Parquet格式作为暂存文件

🔍 Data Flow Limitations

🔍 数据流限制

General Limits

通用限制

Column name length: 128 characters maximum
Row size: 1 MB maximum (some sinks like SQL have lower limits)
String column size: Varies by sink (SQL: 8000 for varchar, 4000 for nvarchar)

列名长度：最大128个字符
行大小：最大1MB（某些接收器如SQL的限制更低）
字符串列大小：因接收器而异（SQL：varchar最大8000，nvarchar最大4000）

Transformation-Specific Limits

特定转换限制

Transformation	Limitation
Lookup	Cache size limited by cluster memory
Join	Large joins may cause memory errors
Pivot	Maximum 10,000 unique values
Window	Requires partitioning for large datasets

转换	限制
Lookup	缓存大小受集群内存限制
Join	大型连接可能导致内存错误
Pivot	最多10,000个唯一值
Window	大型数据集需要分区

Performance Considerations

性能注意事项

Partitioning: Always partition large datasets before transformations
Broadcast: Use broadcast hint for small dimension tables
Sink optimization: Enable table option "Recreate" instead of "Truncate" for better performance

分区：在转换前始终对大型数据集进行分区
广播：对小型维度表使用广播提示
接收器优化：启用表选项"重新创建"而非"截断"以获得更好的性能

🛡️ Validation Checklist for Pipeline Creation

🛡️ 管道创建验证清单

Before Creating Pipeline

创建管道前

Linked Service Validation

链接服务验证

Blob Storage: If using managed identity/service principal,
```
accountKind
```
is set
SQL Database: Authentication method matches security requirements
All services: Secrets stored in Key Vault, not hardcoded
All services: Firewall rules configured for integration runtime IPs
Network: Private endpoints configured if using VNet integration

Blob存储：如果使用托管标识/服务主体，已设置
```
accountKind
```
SQL数据库：身份验证方法符合安全要求
所有服务：机密存储在Key Vault中，未硬编码
所有服务：已为集成运行时IP配置防火墙规则
网络：如果使用VNet集成，已配置专用端点

Activity Configuration Validation

活动配置验证

ForEach:
```
batchCount
```
≤ 50 if parallel execution
Lookup: Query returns < 5000 rows and < 4 MB data
Copy: DIU configured appropriately (2-256 for Azure IR)
Copy: Staging enabled for large data movements
All activities: Timeout values appropriate for expected execution time
All activities: Retry logic configured for transient failures

ForEach：如果是并行执行，
```
batchCount
```
≤50
Lookup：查询返回<5000行且<4MB数据
Copy：已适当配置DIU（Azure IR为2-256）
Copy：大型数据移动已启用暂存
所有活动：超时值与预期执行时间匹配
所有活动：已为临时故障配置重试逻辑

Data Flow Validation

数据流验证

🔍 Automated Validation Script

🔍 自动化验证脚本

CRITICAL: Always run automated validation before committing or deploying ADF pipelines!

The adf-master plugin includes a comprehensive PowerShell validation script that checks for ALL the rules and limitations documented above.

关键：在提交或部署ADF管道前始终运行自动化验证！

adf-master插件包含一个全面的PowerShell验证脚本，可检查上述所有规则和限制。

Using the Validation Script

使用验证脚本

Location:

${CLAUDE_PLUGIN_ROOT}/scripts/validate-adf-pipelines.ps1

Basic usage:

powershell

undefined

位置：

${CLAUDE_PLUGIN_ROOT}/scripts/validate-adf-pipelines.ps1

基本用法：

powershell

undefined

From the root of your ADF repository

从ADF仓库根目录执行

pwsh -File validate-adf-pipelines.ps1


**With custom paths:**
```powershell
pwsh -File validate-adf-pipelines.ps1 `
    -PipelinePath "path/to/pipeline" `
    -DatasetPath "path/to/dataset"

With strict mode (additional warnings):

powershell

pwsh -File validate-adf-pipelines.ps1 -Strict

pwsh -File validate-adf-pipelines.ps1


**自定义路径：**
```powershell
pwsh -File validate-adf-pipelines.ps1 `
    -PipelinePath "path/to/pipeline" `
    -DatasetPath "path/to/dataset"

严格模式（附加警告）：

powershell

pwsh -File validate-adf-pipelines.ps1 -Strict

What the Script Validates

脚本验证内容

The automated validation script checks for issues that Microsoft's official

@microsoft/azure-data-factory-utilities

package does NOT validate:

Activity Nesting Violations:
- ForEach → ForEach, Until, Validation
- Until → Until, ForEach, Validation
- IfCondition → ForEach, If, IfCondition, Switch, Until, Validation
- Switch → ForEach, If, IfCondition, Switch, Until, Validation
Resource Limits:
- Pipeline activity count (max 120, warn at 100)
- Pipeline parameter count (max 50)
- Pipeline variable count (max 50)
- ForEach batchCount limit (max 50, warn at 30 in strict mode)
Variable Scope Violations:
- SetVariable in parallel ForEach (causes race conditions)
- Proper AppendVariable vs SetVariable usage
Dataset Configuration Issues:
- Missing fileName or wildcardFileName for file-based datasets
- AzureBlobFSLocation missing required fileSystem property
- Missing required properties for DelimitedText, Json, Parquet types
Copy Activity Validations:
- Source/sink type compatibility with dataset types
- Lookup activity firstRowOnly=false warnings (5000 row/4MB limits)
- Blob file dependencies (additionalColumns logging pattern)

此自动化验证脚本会检查Microsoft官方

@microsoft/azure-data-factory-utilities

包未验证的问题：

活动嵌套违规：
- ForEach → ForEach、Until、Validation
- Until → Until、ForEach、Validation
- IfCondition → ForEach、If、IfCondition、Switch、Until、Validation
- Switch → ForEach、If、IfCondition、Switch、Until、Validation
资源限制：
- 管道活动数（最大120，100时警告）
- 管道参数数（最大50）
- 管道变量数（最大50）
- ForEach batchCount限制（最大50，严格模式下30时警告）
变量作用域违规：
- 并行ForEach中的SetVariable（会导致竞争条件）
- AppendVariable与SetVariable的正确使用
数据集配置问题：
- 基于文件的数据集缺少fileName或wildcardFileName
- AzureBlobFSLocation缺少必需的fileSystem属性
- DelimitedText、Json、Parquet类型缺少必需属性
复制活动验证：
- 源/接收器类型与数据集类型兼容性
- Lookup活动firstRowOnly=false警告（5000行/4MB限制）
- Blob文件依赖项（additionalColumns日志模式）

Integration with CI/CD

与CI/CD集成

GitHub Actions example:

yaml

- name: Validate ADF Pipelines
  run: |
    pwsh -File validate-adf-pipelines.ps1 -PipelinePath pipeline -DatasetPath dataset
  shell: pwsh

Azure DevOps example:

yaml

- task: PowerShell@2
  displayName: 'Validate ADF Pipelines'
  inputs:
    filePath: 'validate-adf-pipelines.ps1'
    arguments: '-PipelinePath pipeline -DatasetPath dataset'
    pwsh: true

GitHub Actions示例：

yaml

- name: Validate ADF Pipelines
  run: |
    pwsh -File validate-adf-pipelines.ps1 -PipelinePath pipeline -DatasetPath dataset
  shell: pwsh

Azure DevOps示例：

yaml

- task: PowerShell@2
  displayName: 'Validate ADF Pipelines'
  inputs:
    filePath: 'validate-adf-pipelines.ps1'
    arguments: '-PipelinePath pipeline -DatasetPath dataset'
    pwsh: true

Command Reference

命令参考

Use the

/adf-validate

command to run the validation script with proper guidance:

bash

/adf-validate

This command will:

Detect your ADF repository structure
Run the validation script with appropriate paths
Parse and explain any errors or warnings found
Provide specific solutions for each violation
Recommend next actions based on results
Suggest CI/CD integration patterns

使用

/adf-validate

命令运行验证脚本并获得适当指导：

bash

/adf-validate

此命令将：

检测ADF仓库结构
使用适当路径运行验证脚本
解析并解释发现的任何错误或警告
为每个违规提供特定解决方案
根据结果推荐后续操作
建议CI/CD集成模式

Exit Codes

退出代码

0: Validation passed (no errors)
1: Validation failed (errors found - DO NOT DEPLOY)

0：验证通过（无错误）
1：验证失败（发现错误 - 请勿部署）

Best Practices

最佳实践

Run validation before every commit to catch issues early
Add validation to CI/CD pipeline to prevent invalid deployments
Use strict mode during development for additional warnings
Re-validate after bulk changes or generated pipelines
Document validation exceptions if you must bypass a warning
Share validation results with team to prevent repeated mistakes

每次提交前运行验证，尽早发现问题
将验证添加到CI/CD管道，防止无效部署
开发期间使用严格模式，获取附加警告
批量更改或生成管道后重新验证
记录验证例外，如果必须绕过警告
与团队共享验证结果，防止重复错误

🚨 CRITICAL: Enforcement Protocol

🚨 关键：执行协议

When creating or modifying ADF pipelines:

ALWAYS validate activity nesting against the permitted/prohibited table
REJECT any attempt to create prohibited nesting combinations
SUGGEST Execute Pipeline workaround for complex nesting needs
VALIDATE linked service authentication matches the connector type
CHECK all limits (activities, parameters, ForEach iterations, etc.)
VERIFY required properties are set (e.g.,
```
accountKind
```
for managed identity)
WARN about common pitfalls specific to the connector being used

Example Validation Response:

❌ INVALID PIPELINE STRUCTURE DETECTED:

Issue: ForEach activity contains another ForEach activity
Location: Pipeline "PL_DataProcessing" → ForEach "OuterLoop" → ForEach "InnerLoop"

This violates Azure Data Factory nesting rules:
- ForEach activities support only a SINGLE level of nesting
- You CANNOT nest ForEach within ForEach

✅ RECOMMENDED SOLUTION:
Use the Execute Pipeline pattern:
1. Create a child pipeline with the inner ForEach logic
2. Replace the inner ForEach with an Execute Pipeline activity
3. Pass required parameters to the child pipeline

Would you like me to generate the refactored pipeline structure?

创建或修改ADF管道时：

始终验证活动嵌套是否符合允许/禁止列表
拒绝任何创建禁止嵌套组合的尝试
建议对复杂嵌套需求使用Execute Pipeline替代方案
验证链接服务身份验证与连接器类型匹配
检查所有限制（活动数、参数数、ForEach迭代数等）
验证必需属性已设置（例如，托管标识的
```
accountKind
```
）
警告所使用连接器的常见陷阱

验证响应示例：

❌ 检测到无效管道结构：

问题：ForEach活动包含另一个ForEach活动
位置：管道 "PL_DataProcessing" → ForEach "OuterLoop" → ForEach "InnerLoop"

这违反了Azure Data Factory嵌套规则：
- ForEach活动仅支持单层级嵌套
- 不能在ForEach中嵌套ForEach

✅ 推荐解决方案：
使用Execute Pipeline模式：
1. 创建包含内部ForEach逻辑的子管道
2. 用Execute Pipeline活动替换内部ForEach
3. 将所需参数传递给子管道

是否需要我生成重构后的管道结构？

📚 Reference Documentation

📚 参考文档

Official Microsoft Learn Resources:

Activity nesting: https://learn.microsoft.com/en-us/azure/data-factory/concepts-nested-activities
Blob Storage connector: https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage
SQL Database connector: https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
Pipeline limits: https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#data-factory-limits

Last Updated: 2025-01-24 (Based on official Microsoft documentation)

This validation rules skill MUST be consulted before creating or modifying ANY Azure Data Factory pipeline to ensure compliance with platform limitations and best practices.

Microsoft官方Learn资源：

活动嵌套：https://learn.microsoft.com/en-us/azure/data-factory/concepts-nested-activities
Blob存储连接器：https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-blob-storage
SQL数据库连接器：https://learn.microsoft.com/en-us/azure/data-factory/connector-azure-sql-database
管道限制：https://learn.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits#data-factory-limits

最后更新： 2025-01-24（基于Microsoft官方文档）

在创建或修改任何ADF管道前，必须参考此验证规则技能，确保符合平台限制和最佳实践。

Progressive Disclosure References

渐进式披露参考

For detailed validation matrices and resource limits, see:

Nesting Rules:
```
references/nesting-rules.md
```
- Complete matrix of permitted and prohibited activity nesting combinations with workaround patterns
Resource Limits:
```
references/resource-limits.md
```
- Complete reference for all ADF limits (pipeline, activity, trigger, data flow, integration runtime, expression, API)

如需详细的验证矩阵和资源限制，请参阅：

嵌套规则：
```
references/nesting-rules.md
```
- 完整的允许/禁止活动嵌套组合矩阵及替代方案模式
资源限制：
```
references/resource-limits.md
```
- 所有ADF限制的完整参考（管道、活动、触发器、数据流、集成运行时、表达式、API）