Loading...
Loading...
Compare original and translation side by side
| Use Case | Reference File |
|---|---|
| Configure task types (notebook, Python, SQL, dbt, etc.) | task-types.md |
| Set up triggers and schedules | triggers-schedules.md |
| Configure notifications and health monitoring | notifications-monitoring.md |
| Complete working examples | examples.md |
| 使用场景 | 参考文件 |
|---|---|
| 配置任务类型(notebook、Python、SQL、dbt等) | task-types.md |
| 设置触发器和调度 | triggers-schedules.md |
| 配置通知和健康监控 | notifications-monitoring.md |
| 完整可用示例 | examples.md |
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create(
name="my-etl-job",
tasks=[
Task(
task_key="extract",
notebook_task=NotebookTask(
notebook_path="/Workspace/Users/user@example.com/extract",
source=Source.WORKSPACE
)
)
]
)
print(f"Created job: {job.job_id}")from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source
w = WorkspaceClient()
job = w.jobs.create(
name="my-etl-job",
tasks=[
Task(
task_key="extract",
notebook_task=NotebookTask(
notebook_path="/Workspace/Users/user@example.com/extract",
source=Source.WORKSPACE
)
)
]
)
print(f"Created job: {job.job_id}")databricks jobs create --json '{
"name": "my-etl-job",
"tasks": [{
"task_key": "extract",
"notebook_task": {
"notebook_path": "/Workspace/Users/user@example.com/extract",
"source": "WORKSPACE"
}
}]
}'databricks jobs create --json '{
"name": "my-etl-job",
"tasks": [{
"task_key": "extract",
"notebook_task": {
"notebook_path": "/Workspace/Users/user@example.com/extract",
"source": "WORKSPACE"
}
}]
}'undefinedundefinedundefinedundefinedtasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.py
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.py
- task_key: load
depends_on:
- task_key: transform
run_if: ALL_SUCCESS # Only run if all dependencies succeed
notebook_task:
notebook_path: ../src/load.pyALL_SUCCESSALL_DONEAT_LEAST_ONE_SUCCESSNONE_FAILEDALL_FAILEDAT_LEAST_ONE_FAILEDtasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.py
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.py
- task_key: load
depends_on:
- task_key: transform
run_if: ALL_SUCCESS # Only run if all dependencies succeed
notebook_task:
notebook_path: ../src/load.pyALL_SUCCESSALL_DONEAT_LEAST_ONE_SUCCESSNONE_FAILEDALL_FAILEDAT_LEAST_ONE_FAILED| Task Type | Use Case | Reference |
|---|---|---|
| Run notebooks | task-types.md#notebook-task |
| Run Python scripts | task-types.md#spark-python-task |
| Run Python wheels | task-types.md#python-wheel-task |
| Run SQL queries/files | task-types.md#sql-task |
| Run dbt projects | task-types.md#dbt-task |
| Trigger DLT/SDP pipelines | task-types.md#pipeline-task |
| Run Spark JARs | task-types.md#spark-jar-task |
| Trigger other jobs | task-types.md#run-job-task |
| Loop over inputs | task-types.md#for-each-task |
| 任务类型 | 使用场景 | 参考链接 |
|---|---|---|
| 运行notebook | task-types.md#notebook-task |
| 运行Python脚本 | task-types.md#spark-python-task |
| 运行Python wheels | task-types.md#python-wheel-task |
| 运行SQL查询/文件 | task-types.md#sql-task |
| 运行dbt项目 | task-types.md#dbt-task |
| 触发DLT/SDP流水线 | task-types.md#pipeline-task |
| 运行Spark JAR包 | task-types.md#spark-jar-task |
| 触发其他作业 | task-types.md#run-job-task |
| 循环处理输入 | task-types.md#for-each-task |
| Trigger Type | Use Case | Reference |
|---|---|---|
| Cron-based scheduling | triggers-schedules.md#cron-schedule |
| Interval-based | triggers-schedules.md#periodic-trigger |
| File arrival events | triggers-schedules.md#file-arrival-trigger |
| Table change events | triggers-schedules.md#table-update-trigger |
| Always-running jobs | triggers-schedules.md#continuous-jobs |
| 触发器类型 | 使用场景 | 参考链接 |
|---|---|---|
| 基于Cron的调度 | triggers-schedules.md#cron-schedule |
| 基于时间间隔的触发 | triggers-schedules.md#periodic-trigger |
| 文件到达事件触发 | triggers-schedules.md#file-arrival-trigger |
| 表变更事件触发 | triggers-schedules.md#table-update-trigger |
| 持续运行的作业 | triggers-schedules.md#continuous-jobs |
job_clusters:
- job_cluster_key: shared_cluster
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
spark_conf:
spark.speculation: "true"
tasks:
- task_key: my_task
job_cluster_key: shared_cluster
notebook_task:
notebook_path: ../src/notebook.pyjob_clusters:
- job_cluster_key: shared_cluster
new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
num_workers: 2
spark_conf:
spark.speculation: "true"
tasks:
- task_key: my_task
job_cluster_key: shared_cluster
notebook_task:
notebook_path: ../src/notebook.pynew_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
autoscale:
min_workers: 2
max_workers: 8new_cluster:
spark_version: "15.4.x-scala2.12"
node_type_id: "i3.xlarge"
autoscale:
min_workers: 2
max_workers: 8tasks:
- task_key: my_task
existing_cluster_id: "0123-456789-abcdef12"
notebook_task:
notebook_path: ../src/notebook.pytasks:
- task_key: my_task
existing_cluster_id: "0123-456789-abcdef12"
notebook_task:
notebook_path: ../src/notebook.pytasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
# No cluster config = serverlesstasks:
- task_key: serverless_task
notebook_task:
notebook_path: ../src/notebook.py
# 无集群配置 = 使用无服务器计算parameters:
- name: env
default: "dev"
- name: date
default: "{{start_date}}" # Dynamic value referenceparameters:
- name: env
default: "dev"
- name: date
default: "{{start_date}}" # 动态值引用undefinedundefinedundefinedundefinedtasks:
- task_key: my_task
notebook_task:
notebook_path: ../src/notebook.py
base_parameters:
env: "{{job.parameters.env}}"
custom_param: "value"tasks:
- task_key: my_task
notebook_task:
notebook_path: ../src/notebook.py
base_parameters:
env: "{{job.parameters.env}}"
custom_param: "value"from databricks.sdk import WorkspaceClient
w = WorkspaceClient()from databricks.sdk import WorkspaceClient
w = WorkspaceClient()undefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedundefinedresources:
jobs:
my_job:
name: "My Job"
permissions:
- level: CAN_VIEW
group_name: "data-analysts"
- level: CAN_MANAGE_RUN
group_name: "data-engineers"
- level: CAN_MANAGE
user_name: "admin@example.com"CAN_VIEWCAN_MANAGE_RUNCAN_MANAGEresources:
jobs:
my_job:
name: "My Job"
permissions:
- level: CAN_VIEW
group_name: "data-analysts"
- level: CAN_MANAGE_RUN
group_name: "data-engineers"
- level: CAN_MANAGE
user_name: "admin@example.com"CAN_VIEWCAN_MANAGE_RUNCAN_MANAGE| Issue | Solution |
|---|---|
| Job cluster startup slow | Use job clusters with |
| Task dependencies not working | Verify |
| Schedule not triggering | Check |
| File arrival not detecting | Ensure path has proper permissions and uses cloud storage URL |
| Table update trigger missing events | Verify Unity Catalog table and proper grants |
| Parameter not accessible | Use |
| "admins" group error | Cannot modify admins permissions on jobs |
| Serverless task fails | Ensure task type supports serverless (notebook, Python) |
| 问题 | 解决方案 |
|---|---|
| 作业集群启动缓慢 | 使用带有 |
| 任务依赖不生效 | 验证 |
| 调度未触发 | 检查 |
| 文件到达事件未被检测到 | 确保路径有正确的权限,且使用云存储URL |
| 表更新触发器未捕获事件 | 验证Unity Catalog表及权限配置正确 |
| 参数无法访问 | 在Notebook中使用 |
| “admins”组权限错误 | 无法修改作业上的admins组权限 |
| 无服务器任务执行失败 | 确保任务类型支持无服务器计算(notebook、Python) |