adf-validation-rules

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

🚨 CRITICAL GUIDELINES

🚨 关键指南

Windows File Path Requirements

Windows文件路径要求

MANDATORY: Always Use Backslashes on Windows for File Paths
When using Edit or Write tools on Windows, you MUST use backslashes (
\
) in file paths, NOT forward slashes (
/
).
Examples:
  • ❌ WRONG:
    D:/repos/project/file.tsx
  • ✅ CORRECT:
    D:\repos\project\file.tsx
This applies to:
  • Edit tool file_path parameter
  • Write tool file_path parameter
  • All file operations on Windows systems
强制要求:Windows系统下文件路径始终使用反斜杠
在Windows系统上使用编辑或写入工具时,文件路径必须使用反斜杠(
\
),绝对不能使用正斜杠(
/
)。
示例:
  • ❌ 错误:
    D:/repos/project/file.tsx
  • ✅ 正确:
    D:\repos\project\file.tsx
此要求适用于:
  • 编辑工具的file_path参数
  • 写入工具的file_path参数
  • Windows系统上的所有文件操作

Documentation Guidelines

文档指南

NEVER create new documentation files unless explicitly requested by the user.
  • Priority: Update existing README.md files rather than creating new documentation
  • Repository cleanliness: Keep repository root clean - only README.md unless user requests otherwise
  • Style: Documentation should be concise, direct, and professional - avoid AI-generated tone
  • User preference: Only create additional .md files when user specifically asks for documentation

除非用户明确要求,否则绝对不要创建新的文档文件。
  • 优先级:优先更新现有README.md文件,而非创建新文档
  • 仓库整洁性:保持仓库根目录整洁 - 除非用户要求,否则只保留README.md
  • 风格:文档应简洁、直接、专业 - 避免AI生成的冗余语气
  • 用户偏好:仅在用户明确要求文档时才创建额外的.md文件

Azure Data Factory Validation Rules and Limitations

Azure Data Factory验证规则与限制

🚨 CRITICAL: Activity Nesting Limitations

🚨 关键:活动嵌套限制

Azure Data Factory has STRICT nesting rules for control flow activities. Violating these rules will cause pipeline failures or prevent pipeline creation.
Azure Data Factory对控制流活动有严格的嵌套规则。违反这些规则会导致管道失败或无法创建管道。

Supported Control Flow Activities for Nesting

支持嵌套的控制流活动

Four control flow activities support nested activities:
  • ForEach: Iterates over collections and executes activities in a loop
  • If Condition: Branches based on true/false evaluation
  • Until: Implements do-until loops with timeout options
  • Switch: Evaluates activities matching case conditions
有四种控制流活动支持嵌套活动:
  • ForEach:遍历集合并循环执行活动
  • If Condition:基于真假判断分支执行
  • Until:实现带超时选项的do-until循环
  • Switch:根据匹配的条件执行活动

✅ PERMITTED Nesting Combinations

✅ 允许的嵌套组合

Parent ActivityCan ContainNotes
ForEachIf Condition✅ Allowed
ForEachSwitch✅ Allowed
UntilIf Condition✅ Allowed
UntilSwitch✅ Allowed
父活动可包含的活动说明
ForEachIf Condition✅ 允许
ForEachSwitch✅ 允许
UntilIf Condition✅ 允许
UntilSwitch✅ 允许

❌ PROHIBITED Nesting Combinations

❌ 禁止的嵌套组合

Parent ActivityCANNOT ContainReason
If ConditionForEach❌ Not supported - use Execute Pipeline workaround
If ConditionSwitch❌ Not supported - use Execute Pipeline workaround
If ConditionUntil❌ Not supported - use Execute Pipeline workaround
If ConditionAnother If❌ Cannot nest If within If
SwitchForEach❌ Not supported - use Execute Pipeline workaround
SwitchIf Condition❌ Not supported - use Execute Pipeline workaround
SwitchUntil❌ Not supported - use Execute Pipeline workaround
SwitchAnother Switch❌ Cannot nest Switch within Switch
ForEachAnother ForEach❌ Single level only - use Execute Pipeline workaround
UntilAnother Until❌ Single level only - use Execute Pipeline workaround
ForEachUntil❌ Single level only - use Execute Pipeline workaround
UntilForEach❌ Single level only - use Execute Pipeline workaround
父活动不可包含的活动原因
If ConditionForEach❌ 不支持 - 使用Execute Pipeline替代方案
If ConditionSwitch❌ 不支持 - 使用Execute Pipeline替代方案
If ConditionUntil❌ 不支持 - 使用Execute Pipeline替代方案
If Condition另一个If❌ 不能在If中嵌套If
SwitchForEach❌ 不支持 - 使用Execute Pipeline替代方案
SwitchIf Condition❌ 不支持 - 使用Execute Pipeline替代方案
SwitchUntil❌ 不支持 - 使用Execute Pipeline替代方案
Switch另一个Switch❌ 不能在Switch中嵌套Switch
ForEach另一个ForEach❌ 仅支持单层级 - 使用Execute Pipeline替代方案
Until另一个Until❌ 仅支持单层级 - 使用Execute Pipeline替代方案
ForEachUntil❌ 仅支持单层级 - 使用Execute Pipeline替代方案
UntilForEach❌ 仅支持单层级 - 使用Execute Pipeline替代方案

🚫 Special Activity Restrictions

🚫 特殊活动限制

Validation Activity:
  • CANNOT be placed inside ANY nested activity
  • CANNOT be used within ForEach, If, Switch, or Until activities
  • ✅ Must be at pipeline root level only
Validation Activity(验证活动)
  • 绝对不能放置在任何嵌套活动内部
  • 绝对不能在ForEach、If、Switch或Until活动中使用
  • ✅ 必须仅位于管道根层级

🔧 Workaround: Execute Pipeline Pattern

🔧 替代方案:Execute Pipeline模式

The ONLY supported workaround for prohibited nesting combinations:
Instead of direct nesting, use the Execute Pipeline Activity to call a child pipeline:
json
{
  "name": "ParentPipeline_WithIfCondition",
  "activities": [
    {
      "name": "IfCondition_Parent",
      "type": "IfCondition",
      "typeProperties": {
        "expression": "@equals(pipeline().parameters.ProcessData, 'true')",
        "ifTrueActivities": [
          {
            "name": "ExecuteChildPipeline_WithForEach",
            "type": "ExecutePipeline",
            "typeProperties": {
              "pipeline": {
                "referenceName": "ChildPipeline_ForEachLoop",
                "type": "PipelineReference"
              },
              "parameters": {
                "ItemList": "@pipeline().parameters.Items"
              }
            }
          }
        ]
      }
    }
  ]
}
Child Pipeline Structure:
json
{
  "name": "ChildPipeline_ForEachLoop",
  "parameters": {
    "ItemList": {"type": "array"}
  },
  "activities": [
    {
      "name": "ForEach_InChildPipeline",
      "type": "ForEach",
      "typeProperties": {
        "items": "@pipeline().parameters.ItemList",
        "activities": [
          // Your ForEach logic here
        ]
      }
    }
  ]
}
Why This Works:
  • Each pipeline can have ONE level of nesting
  • Execute Pipeline creates a new pipeline context
  • Child pipeline gets its own nesting level allowance
  • Enables unlimited depth through pipeline chaining
针对禁止嵌套组合的唯一支持的替代方案:
不要直接嵌套,而是使用Execute Pipeline Activity调用子管道:
json
{
  "name": "ParentPipeline_WithIfCondition",
  "activities": [
    {
      "name": "IfCondition_Parent",
      "type": "IfCondition",
      "typeProperties": {
        "expression": "@equals(pipeline().parameters.ProcessData, 'true')",
        "ifTrueActivities": [
          {
            "name": "ExecuteChildPipeline_WithForEach",
            "type": "ExecutePipeline",
            "typeProperties": {
              "pipeline": {
                "referenceName": "ChildPipeline_ForEachLoop",
                "type": "PipelineReference"
              },
              "parameters": {
                "ItemList": "@pipeline().parameters.Items"
              }
            }
          }
        ]
      }
    }
  ]
}
子管道结构:
json
{
  "name": "ChildPipeline_ForEachLoop",
  "parameters": {
    "ItemList": {"type": "array"}
  },
  "activities": [
    {
      "name": "ForEach_InChildPipeline",
      "type": "ForEach",
      "typeProperties": {
        "items": "@pipeline().parameters.ItemList",
        "activities": [
          // 此处添加你的ForEach逻辑
        ]
      }
    }
  ]
}
为什么此方案可行:
  • 每个管道可以有一层嵌套
  • Execute Pipeline会创建新的管道上下文
  • 子管道拥有自己的嵌套层级配额
  • 通过管道链式调用实现无限深度

🔢 Activity and Resource Limits

🔢 活动与资源限制

Pipeline Limits

管道限制

ResourceLimitNotes
Activities per pipeline80Includes inner activities for containers
Parameters per pipeline50-
ForEach concurrent iterations50 (maximum)Set via
batchCount
property
ForEach items100,000-
Lookup activity rows5,000Maximum rows returned
Lookup activity size4 MBMaximum size of returned data
Web activity timeout1 hourDefault timeout for Web activities
Copy activity timeout7 daysMaximum execution time
资源限制说明
每个管道的活动数量80包含容器的内部活动
每个管道的参数数量50-
ForEach并发迭代数50(最大值)通过
batchCount
属性设置
ForEach项数量100,000-
Lookup活动返回行数5,000返回的最大行数
Lookup活动返回数据大小4 MB返回数据的最大大小
Web活动超时时间1小时Web活动的默认超时时间
Copy活动超时时间7天最大执行时间

ForEach Activity Configuration

ForEach活动配置

json
{
  "name": "ForEachActivity",
  "type": "ForEach",
  "typeProperties": {
    "items": "@pipeline().parameters.ItemList",
    "isSequential": false,  // false = parallel execution
    "batchCount": 50,       // Max 50 concurrent iterations
    "activities": [
      // Nested activities
    ]
  }
}
Critical Considerations:
  • isSequential: true
    → Executes one item at a time (slow but predictable)
  • isSequential: false
    → Executes up to
    batchCount
    items in parallel
  • Maximum
    batchCount
    is 50 regardless of setting
  • Cannot use Set Variable activity inside parallel ForEach (variable scope is pipeline-level)
json
{
  "name": "ForEachActivity",
  "type": "ForEach",
  "typeProperties": {
    "items": "@pipeline().parameters.ItemList",
    "isSequential": false,  // false = 并行执行
    "batchCount": 50,       // 最大50个并发迭代
    "activities": [
      // 嵌套活动
    ]
  }
}
关键注意事项:
  • isSequential: true
    → 逐个执行项(速度慢但可预测)
  • isSequential: false
    → 最多并行执行
    batchCount
    个项
  • 无论设置如何,
    batchCount
    的最大值为50
  • 不能在并行ForEach中使用Set Variable活动(变量作用域为管道层级)

Set Variable Activity Limitations

Set Variable活动限制

CANNOT use
Set Variable
inside ForEach with
isSequential: false
  • Reason: Variables are pipeline-scoped, not ForEach-scoped
  • Multiple parallel iterations would cause race conditions
  • Alternative: Use
    Append Variable
    with array type, or use sequential execution
不能
isSequential: false
的ForEach中使用
Set Variable
  • 原因:变量是管道作用域,而非ForEach作用域
  • 多个并行迭代会导致竞争条件
  • 替代方案:使用数组类型的
    Append Variable
    ,或使用顺序执行

📊 Linked Services: Azure Blob Storage

📊 链接服务:Azure Blob存储

Authentication Methods

身份验证方法

1. Account Key (Basic)

1. 账户密钥(基础)

json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "connectionString": {
      "type": "SecureString",
      "value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>"
    }
  }
}
⚠️ Limitations:
  • Secondary Blob service endpoints are NOT supported
  • Security Risk: Account keys should be stored in Azure Key Vault
json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "connectionString": {
      "type": "SecureString",
      "value": "DefaultEndpointsProtocol=https;AccountName=<account>;AccountKey=<key>"
    }
  }
}
⚠️ 限制:
  • 不支持Blob服务二级端点
  • 安全风险:账户密钥应存储在Azure Key Vault中

2. Shared Access Signature (SAS)

2. 共享访问签名(SAS)

json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "sasUri": {
      "type": "SecureString",
      "value": "https://<account>.blob.core.windows.net/<container>?<SAS-token>"
    }
  }
}
Critical Requirements:
  • Dataset
    folderPath
    must be absolute path from container level
  • SAS token expiry must extend beyond pipeline execution
  • SAS URI path must align with dataset configuration
json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "sasUri": {
      "type": "SecureString",
      "value": "https://<account>.blob.core.windows.net/<container>?<SAS-token>"
    }
  }
}
关键要求:
  • 数据集
    folderPath
    必须是从容器层级开始的绝对路径
  • SAS令牌的有效期必须覆盖管道执行时间
  • SAS URI路径必须与数据集配置一致

3. Service Principal

3. 服务主体

json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2",  // REQUIRED for service principal
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}
Critical Requirements:
  • accountKind
    MUST be set (StorageV2, BlobStorage, or BlockBlobStorage)
  • Service Principal requires Storage Blob Data Reader (source) or Storage Blob Data Contributor (sink) role
  • NOT compatible with soft-deleted blob accounts in Data Flow
json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2",  // 服务主体必需
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}
关键要求:
  • 必须设置
    accountKind
    (StorageV2、BlobStorage或BlockBlobStorage)
  • 服务主体需要Storage Blob Data Reader(源)或Storage Blob Data Contributor(接收器)角色
  • 与数据流中的软删除Blob账户不兼容

4. Managed Identity (Recommended)

4. 托管标识(推荐)

json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2"  // REQUIRED for managed identity
  },
  "connectVia": {
    "referenceName": "AutoResolveIntegrationRuntime",
    "type": "IntegrationRuntimeReference"
  }
}
Critical Requirements:
  • accountKind
    MUST be specified (cannot be empty or "Storage")
  • ❌ Empty or "Storage" account kind will cause Data Flow failures
  • Managed identity must have Storage Blob Data Reader/Contributor role assigned
  • For Storage firewall: Must enable "Allow trusted Microsoft services"
json
{
  "type": "AzureBlobStorage",
  "typeProperties": {
    "serviceEndpoint": "https://<account>.blob.core.windows.net",
    "accountKind": "StorageV2"  // 托管标识必需
  },
  "connectVia": {
    "referenceName": "AutoResolveIntegrationRuntime",
    "type": "IntegrationRuntimeReference"
  }
}
关键要求:
  • 必须指定
    accountKind
    (不能为空或"Storage")
  • ❌ 空值或"Storage"账户类型会导致数据流失败
  • 托管标识必须被分配Storage Blob Data Reader/Contributor角色
  • 对于存储防火墙:必须启用"允许受信任的Microsoft服务"

Common Blob Storage Pitfalls

Blob存储常见陷阱

IssueCauseSolution
Data Flow fails with managed identity
accountKind
empty or "Storage"
Set
accountKind
to StorageV2
Secondary endpoint doesn't workUsing account key authNot supported - use different auth method
SAS token expired during runToken expiry too shortExtend SAS token validity period
Cannot access $logs containerSystem container not visible in UIUse direct path reference
Soft-deleted blobs inaccessibleService principal/managed identityUse account key or SAS instead
Private endpoint connection failsWrong endpoint for Data FlowEnsure ADLS Gen2 private endpoint exists
问题原因解决方案
托管标识下数据流失败
accountKind
为空或"Storage"
accountKind
设置为StorageV2
二级端点无法工作使用账户密钥身份验证不支持 - 使用其他身份验证方法
运行期间SAS令牌过期令牌有效期过短延长SAS令牌的有效期
无法访问$logs容器系统容器在UI中不可见使用直接路径引用
软删除的Blob无法访问使用服务主体/托管标识改用账户密钥或SAS
专用端点连接失败数据流的端点错误确保ADLS Gen2专用端点存在

📊 Linked Services: Azure SQL Database

📊 链接服务:Azure SQL数据库

Authentication Methods

身份验证方法

1. SQL Authentication

1. SQL身份验证

json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SQL",
    "userName": "<username>",
    "password": {
      "type": "SecureString",
      "value": "<password>"
    }
  }
}
Best Practice:
  • Store password in Azure Key Vault
  • Use connection string with Key Vault reference
json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SQL",
    "userName": "<username>",
    "password": {
      "type": "SecureString",
      "value": "<password>"
    }
  }
}
最佳实践:
  • 将密码存储在Azure Key Vault中
  • 使用包含Key Vault引用的连接字符串

2. Service Principal

2. 服务主体

json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "ServicePrincipal",
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}
Requirements:
  • Microsoft Entra admin must be configured on SQL server
  • Service principal must have contained database user created
  • Grant appropriate role:
    db_datareader
    ,
    db_datawriter
    , etc.
json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "ServicePrincipal",
    "servicePrincipalId": "<client-id>",
    "servicePrincipalCredential": {
      "type": "SecureString",
      "value": "<client-secret>"
    },
    "tenant": "<tenant-id>"
  }
}
要求:
  • SQL服务器必须配置Microsoft Entra管理员
  • 必须为服务主体创建包含数据库用户
  • 授予适当的角色:
    db_datareader
    db_datawriter

3. Managed Identity

3. 托管标识

json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SystemAssignedManagedIdentity"
  }
}
Requirements:
  • Create contained database user for managed identity
  • Grant appropriate database roles
  • Configure firewall to allow Azure services (or specific IP ranges)
json
{
  "type": "AzureSqlDatabase",
  "typeProperties": {
    "server": "<server-name>.database.windows.net",
    "database": "<database-name>",
    "authenticationType": "SystemAssignedManagedIdentity"
  }
}
要求:
  • 为托管标识创建包含数据库用户
  • 授予适当的数据库角色
  • 配置防火墙以允许Azure服务(或特定IP范围)

SQL Database Configuration Best Practices

SQL数据库配置最佳实践

Connection String Parameters

连接字符串参数

Server=tcp:<server>.database.windows.net,1433;
Database=<database>;
Encrypt=mandatory;          // Options: mandatory, optional, strict
TrustServerCertificate=false;
ConnectTimeout=30;
CommandTimeout=120;
Pooling=true;
ConnectRetryCount=3;
ConnectRetryInterval=10;
Critical Parameters:
  • Encrypt
    : Default is
    mandatory
    (recommended)
  • Pooling
    : Set to
    false
    if experiencing idle connection issues
  • ConnectRetryCount
    : Recommended for transient fault handling
  • ConnectRetryInterval
    : Seconds between retries
Server=tcp:<server>.database.windows.net,1433;
Database=<database>;
Encrypt=mandatory;          // 选项:mandatory, optional, strict
TrustServerCertificate=false;
ConnectTimeout=30;
CommandTimeout=120;
Pooling=true;
ConnectRetryCount=3;
ConnectRetryInterval=10;
关键参数:
  • Encrypt
    :默认值为
    mandatory
    (推荐)
  • Pooling
    :如果遇到空闲连接问题,设置为
    false
  • ConnectRetryCount
    :推荐用于临时故障处理
  • ConnectRetryInterval
    :重试间隔(秒)

Common SQL Database Pitfalls

SQL数据库常见陷阱

IssueCauseSolution
Serverless tier auto-pausedPipeline doesn't wait for resumeImplement retry logic or keep-alive
Connection pool timeoutIdle connections closedAdd
Pooling=false
or configure retry
Firewall blocks connectionIP not whitelistedAdd Azure IR IPs or enable Azure services
Always Encrypted fails in Data FlowNot supported for sinkUse service principal/managed identity in copy activity
Decimal precision lossCopy supports up to 28 precisionUse string type for higher precision
Parallel copy not workingNo partition configurationEnable physical or dynamic range partitioning
问题原因解决方案
无服务器层自动暂停管道未等待恢复实现重试逻辑或保持活动连接
连接池超时空闲连接被关闭添加
Pooling=false
或配置重试
防火墙阻止连接IP未被列入白名单添加Azure IR IP或启用Azure服务
数据流中始终加密失败接收器不支持在复制活动中使用服务主体/托管标识
小数精度丢失复制活动最多支持28位精度对更高精度使用字符串类型
并行复制不工作未配置分区启用物理或动态范围分区

Performance Optimization

性能优化

Parallel Copy Configuration

并行复制配置

json
{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"  // or "DynamicRange"
  },
  "parallelCopies": 8,  // Recommended: (DIU or IR nodes) × (2 to 4)
  "enableStaging": true,
  "stagingSettings": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorage",
      "type": "LinkedServiceReference"
    }
  }
}
Partition Options:
  • PhysicalPartitionsOfTable
    : Uses SQL Server physical partitions
  • DynamicRange
    : Creates logical partitions based on column values
  • None
    : No partitioning (default)
Staging Best Practices:
  • Always use staging for large data movements (> 1GB)
  • Use PolyBase or COPY statement for best performance
  • Parquet format recommended for staging files
json
{
  "source": {
    "type": "AzureSqlSource",
    "partitionOption": "PhysicalPartitionsOfTable"  // 或 "DynamicRange"
  },
  "parallelCopies": 8,  // 推荐:(DIU或IR节点) × (2至4)
  "enableStaging": true,
  "stagingSettings": {
    "linkedServiceName": {
      "referenceName": "AzureBlobStorage",
      "type": "LinkedServiceReference"
    }
  }
}
分区选项:
  • PhysicalPartitionsOfTable
    :使用SQL Server物理分区
  • DynamicRange
    :基于列值创建逻辑分区
  • None
    :无分区(默认)
暂存最佳实践:
  • 对于大数据移动(>1GB),始终使用暂存
  • 使用PolyBase或COPY语句以获得最佳性能
  • 推荐使用Parquet格式作为暂存文件

🔍 Data Flow Limitations

🔍 数据流限制

General Limits

通用限制

  • Column name length: 128 characters maximum
  • Row size: 1 MB maximum (some sinks like SQL have lower limits)
  • String column size: Varies by sink (SQL: 8000 for varchar, 4000 for nvarchar)
  • 列名长度:最大128个字符
  • 行大小:最大1MB(某些接收器如SQL的限制更低)
  • 字符串列大小:因接收器而异(SQL:varchar最大8000,nvarchar最大4000)

Transformation-Specific Limits

特定转换限制

TransformationLimitation
LookupCache size limited by cluster memory
JoinLarge joins may cause memory errors
PivotMaximum 10,000 unique values
WindowRequires partitioning for large datasets
转换限制
Lookup缓存大小受集群内存限制
Join大型连接可能导致内存错误
Pivot最多10,000个唯一值
Window大型数据集需要分区

Performance Considerations

性能注意事项

  • Partitioning: Always partition large datasets before transformations
  • Broadcast: Use broadcast hint for small dimension tables
  • Sink optimization: Enable table option "Recreate" instead of "Truncate" for better performance
  • 分区:在转换前始终对大型数据集进行分区
  • 广播:对小型维度表使用广播提示
  • 接收器优化:启用表选项"重新创建"而非"截断"以获得更好的性能

🛡️ Validation Checklist for Pipeline Creation

🛡️ 管道创建验证清单

Before Creating Pipeline

创建管道前

  • Verify activity nesting follows permitted combinations
  • Check ForEach activities don't contain other ForEach/Until
  • Verify If/Switch activities don't contain ForEach/Until/If/Switch
  • Ensure Validation activities are at pipeline root level only
  • Confirm total activities < 80 per pipeline
  • Verify no Set Variable activities in parallel ForEach
  • 验证活动嵌套符合允许的组合
  • 检查ForEach活动不包含其他ForEach/Until
  • 验证If/Switch活动不包含ForEach/Until/If/Switch
  • 确保Validation活动仅位于管道根层级
  • 确认每个管道的总活动数<80
  • 验证并行ForEach中没有Set Variable活动

Linked Service Validation

链接服务验证

  • Blob Storage: If using managed identity/service principal,
    accountKind
    is set
  • SQL Database: Authentication method matches security requirements
  • All services: Secrets stored in Key Vault, not hardcoded
  • All services: Firewall rules configured for integration runtime IPs
  • Network: Private endpoints configured if using VNet integration
  • Blob存储:如果使用托管标识/服务主体,已设置
    accountKind
  • SQL数据库:身份验证方法符合安全要求
  • 所有服务:机密存储在Key Vault中,未硬编码
  • 所有服务:已为集成运行时IP配置防火墙规则
  • 网络:如果使用VNet集成,已配置专用端点

Activity Configuration Validation

活动配置验证

  • ForEach:
    batchCount
    ≤ 50 if parallel execution
  • Lookup: Query returns < 5000 rows and < 4 MB data
  • Copy: DIU configured appropriately (2-256 for Azure IR)
  • Copy: Staging enabled for large data movements
  • All activities: Timeout values appropriate for expected execution time
  • All activities: Retry logic configured for transient failures
  • ForEach:如果是并行执行,
    batchCount
    ≤50
  • Lookup:查询返回<5000行且<4MB数据
  • Copy:已适当配置DIU(Azure IR为2-256)
  • Copy:大型数据移动已启用暂存
  • 所有活动:超时值与预期执行时间匹配
  • 所有活动:已为临时故障配置重试逻辑

Data Flow Validation

数据流验证

  • Column names ≤ 128 characters
  • Source query doesn't return > 1 MB per row
  • Partitioning configured for large datasets
  • Sink has appropriate schema and data type mappings
  • Staging linked service configured for optimal performance
  • 列名≤128个字符
  • 源查询返回的每行数据<1MB
  • 大型数据集已配置分区
  • 接收器有适当的架构和数据类型映射
  • 已配置暂存链接服务以优化性能

🔍 Automated Validation Script

🔍 自动化验证脚本

CRITICAL: Always run automated validation before committing or deploying ADF pipelines!
The adf-master plugin includes a comprehensive PowerShell validation script that checks for ALL the rules and limitations documented above.
关键:在提交或部署ADF管道前始终运行自动化验证!
adf-master插件包含一个全面的PowerShell验证脚本,可检查上述所有规则和限制。

Using the Validation Script

使用验证脚本

Location:
${CLAUDE_PLUGIN_ROOT}/scripts/validate-adf-pipelines.ps1
Basic usage:
powershell
undefined
位置:
${CLAUDE_PLUGIN_ROOT}/scripts/validate-adf-pipelines.ps1
基本用法:
powershell
undefined

From the root of your ADF repository

从ADF仓库根目录执行

pwsh -File validate-adf-pipelines.ps1

**With custom paths:**
```powershell
pwsh -File validate-adf-pipelines.ps1 `
    -PipelinePath "path/to/pipeline" `
    -DatasetPath "path/to/dataset"
With strict mode (additional warnings):
powershell
pwsh -File validate-adf-pipelines.ps1 -Strict
pwsh -File validate-adf-pipelines.ps1

**自定义路径:**
```powershell
pwsh -File validate-adf-pipelines.ps1 `
    -PipelinePath "path/to/pipeline" `
    -DatasetPath "path/to/dataset"
严格模式(附加警告):
powershell
pwsh -File validate-adf-pipelines.ps1 -Strict

What the Script Validates

脚本验证内容

The automated validation script checks for issues that Microsoft's official
@microsoft/azure-data-factory-utilities
package does NOT validate:
  1. Activity Nesting Violations:
    • ForEach → ForEach, Until, Validation
    • Until → Until, ForEach, Validation
    • IfCondition → ForEach, If, IfCondition, Switch, Until, Validation
    • Switch → ForEach, If, IfCondition, Switch, Until, Validation
  2. Resource Limits:
    • Pipeline activity count (max 120, warn at 100)
    • Pipeline parameter count (max 50)
    • Pipeline variable count (max 50)
    • ForEach batchCount limit (max 50, warn at 30 in strict mode)
  3. Variable Scope Violations:
    • SetVariable in parallel ForEach (causes race conditions)
    • Proper AppendVariable vs SetVariable usage
  4. Dataset Configuration Issues:
    • Missing fileName or wildcardFileName for file-based datasets
    • AzureBlobFSLocation missing required fileSystem property
    • Missing required properties for DelimitedText, Json, Parquet types
  5. Copy Activity Validations:
    • Source/sink type compatibility with dataset types
    • Lookup activity firstRowOnly=false warnings (5000 row/4MB limits)
    • Blob file dependencies (additionalColumns logging pattern)
此自动化验证脚本会检查Microsoft官方
@microsoft/azure-data-factory-utilities
验证的问题:
  1. 活动嵌套违规:
    • ForEach → ForEach、Until、Validation
    • Until → Until、ForEach、Validation
    • IfCondition → ForEach、If、IfCondition、Switch、Until、Validation
    • Switch → ForEach、If、IfCondition、Switch、Until、Validation
  2. 资源限制:
    • 管道活动数(最大120,100时警告)
    • 管道参数数(最大50)
    • 管道变量数(最大50)
    • ForEach batchCount限制(最大50,严格模式下30时警告)
  3. 变量作用域违规:
    • 并行ForEach中的SetVariable(会导致竞争条件)
    • AppendVariable与SetVariable的正确使用
  4. 数据集配置问题:
    • 基于文件的数据集缺少fileName或wildcardFileName
    • AzureBlobFSLocation缺少必需的fileSystem属性
    • DelimitedText、Json、Parquet类型缺少必需属性
  5. 复制活动验证:
    • 源/接收器类型与数据集类型兼容性
    • Lookup活动firstRowOnly=false警告(5000行/4MB限制)
    • Blob文件依赖项(additionalColumns日志模式)

Integration with CI/CD

与CI/CD集成

GitHub Actions example:
yaml
- name: Validate ADF Pipelines
  run: |
    pwsh -File validate-adf-pipelines.ps1 -PipelinePath pipeline -DatasetPath dataset
  shell: pwsh
Azure DevOps example:
yaml
- task: PowerShell@2
  displayName: 'Validate ADF Pipelines'
  inputs:
    filePath: 'validate-adf-pipelines.ps1'
    arguments: '-PipelinePath pipeline -DatasetPath dataset'
    pwsh: true
GitHub Actions示例:
yaml
- name: Validate ADF Pipelines
  run: |
    pwsh -File validate-adf-pipelines.ps1 -PipelinePath pipeline -DatasetPath dataset
  shell: pwsh
Azure DevOps示例:
yaml
- task: PowerShell@2
  displayName: 'Validate ADF Pipelines'
  inputs:
    filePath: 'validate-adf-pipelines.ps1'
    arguments: '-PipelinePath pipeline -DatasetPath dataset'
    pwsh: true

Command Reference

命令参考

Use the
/adf-validate
command to run the validation script with proper guidance:
bash
/adf-validate
This command will:
  1. Detect your ADF repository structure
  2. Run the validation script with appropriate paths
  3. Parse and explain any errors or warnings found
  4. Provide specific solutions for each violation
  5. Recommend next actions based on results
  6. Suggest CI/CD integration patterns
使用
/adf-validate
命令运行验证脚本并获得适当指导:
bash
/adf-validate
此命令将:
  1. 检测ADF仓库结构
  2. 使用适当路径运行验证脚本
  3. 解析并解释发现的任何错误或警告
  4. 为每个违规提供特定解决方案
  5. 根据结果推荐后续操作
  6. 建议CI/CD集成模式

Exit Codes

退出代码

  • 0: Validation passed (no errors)
  • 1: Validation failed (errors found - DO NOT DEPLOY)
  • 0:验证通过(无错误)
  • 1:验证失败(发现错误 - 请勿部署)

Best Practices

最佳实践

  1. Run validation before every commit to catch issues early
  2. Add validation to CI/CD pipeline to prevent invalid deployments
  3. Use strict mode during development for additional warnings
  4. Re-validate after bulk changes or generated pipelines
  5. Document validation exceptions if you must bypass a warning
  6. Share validation results with team to prevent repeated mistakes
  1. 每次提交前运行验证,尽早发现问题
  2. 将验证添加到CI/CD管道,防止无效部署
  3. 开发期间使用严格模式,获取附加警告
  4. 批量更改或生成管道后重新验证
  5. 记录验证例外,如果必须绕过警告
  6. 与团队共享验证结果,防止重复错误

🚨 CRITICAL: Enforcement Protocol

🚨 关键:执行协议

When creating or modifying ADF pipelines:
  1. ALWAYS validate activity nesting against the permitted/prohibited table
  2. REJECT any attempt to create prohibited nesting combinations
  3. SUGGEST Execute Pipeline workaround for complex nesting needs
  4. VALIDATE linked service authentication matches the connector type
  5. CHECK all limits (activities, parameters, ForEach iterations, etc.)
  6. VERIFY required properties are set (e.g.,
    accountKind
    for managed identity)
  7. WARN about common pitfalls specific to the connector being used
Example Validation Response:
❌ INVALID PIPELINE STRUCTURE DETECTED:

Issue: ForEach activity contains another ForEach activity
Location: Pipeline "PL_DataProcessing" → ForEach "OuterLoop" → ForEach "InnerLoop"

This violates Azure Data Factory nesting rules:
- ForEach activities support only a SINGLE level of nesting
- You CANNOT nest ForEach within ForEach

✅ RECOMMENDED SOLUTION:
Use the Execute Pipeline pattern:
1. Create a child pipeline with the inner ForEach logic
2. Replace the inner ForEach with an Execute Pipeline activity
3. Pass required parameters to the child pipeline

Would you like me to generate the refactored pipeline structure?
创建或修改ADF管道时:
  1. 始终验证活动嵌套是否符合允许/禁止列表
  2. 拒绝任何创建禁止嵌套组合的尝试
  3. 建议对复杂嵌套需求使用Execute Pipeline替代方案
  4. 验证链接服务身份验证与连接器类型匹配
  5. 检查所有限制(活动数、参数数、ForEach迭代数等)
  6. 验证必需属性已设置(例如,托管标识的
    accountKind
  7. 警告所使用连接器的常见陷阱
验证响应示例:
❌ 检测到无效管道结构:

问题:ForEach活动包含另一个ForEach活动
位置:管道 "PL_DataProcessing" → ForEach "OuterLoop" → ForEach "InnerLoop"

这违反了Azure Data Factory嵌套规则:
- ForEach活动仅支持单层级嵌套
- 不能在ForEach中嵌套ForEach

✅ 推荐解决方案:
使用Execute Pipeline模式:
1. 创建包含内部ForEach逻辑的子管道
2. 用Execute Pipeline活动替换内部ForEach
3. 将所需参数传递给子管道

是否需要我生成重构后的管道结构?

📚 Reference Documentation

📚 参考文档

Official Microsoft Learn Resources:
Last Updated: 2025-01-24 (Based on official Microsoft documentation)
This validation rules skill MUST be consulted before creating or modifying ANY Azure Data Factory pipeline to ensure compliance with platform limitations and best practices.
Microsoft官方Learn资源:
最后更新: 2025-01-24(基于Microsoft官方文档)
在创建或修改任何ADF管道前,必须参考此验证规则技能,确保符合平台限制和最佳实践。

Progressive Disclosure References

渐进式披露参考

For detailed validation matrices and resource limits, see:
  • Nesting Rules:
    references/nesting-rules.md
    - Complete matrix of permitted and prohibited activity nesting combinations with workaround patterns
  • Resource Limits:
    references/resource-limits.md
    - Complete reference for all ADF limits (pipeline, activity, trigger, data flow, integration runtime, expression, API)
如需详细的验证矩阵和资源限制,请参阅:
  • 嵌套规则
    references/nesting-rules.md
    - 完整的允许/禁止活动嵌套组合矩阵及替代方案模式
  • 资源限制
    references/resource-limits.md
    - 所有ADF限制的完整参考(管道、活动、触发器、数据流、集成运行时、表达式、API)