databricks-jobs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Lakeflow Jobs Development

Lakeflow Jobs开发

FIRST: Use the parent

databricks

skill for CLI basics, authentication, profile selection, and data exploration commands.

Lakeflow Jobs are scheduled workflows that run notebooks, Python scripts, SQL queries, and other tasks on Databricks.

首要步骤：使用父级

databricks

技能掌握CLI基础操作、身份验证、配置文件选择和数据探索相关命令。

Lakeflow Jobs是运行在Databricks上的可调度工作流，支持执行notebook、Python脚本、SQL查询及其他类型的任务。

Scaffolding a New Job Project

新建作业项目脚手架

Use

databricks bundle init

with a config file to scaffold non-interactively. This creates a project in the

<project_name>/

directory:

bash

databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null

```
project_name
```
: letters, numbers, underscores only

After scaffolding, create

CLAUDE.md

and

AGENTS.md

in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:

undefined

使用带配置文件的

databricks bundle init

命令可以非交互方式生成项目脚手架，会在

<project_name>/

目录下创建项目：

bash

databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null

```
project_name
```
：仅支持字母、数字和下划线

脚手架生成完成后，请在项目目录下创建

CLAUDE.md

和

AGENTS.md

文件，这两个文件是为Agent提供项目操作指引的必要文件，可使用以下内容：

undefined

Databricks Asset Bundles Project

Databricks Asset Bundles项目

This project uses Databricks Asset Bundles for deployment.

本项目使用Databricks Asset Bundles进行部署。

Prerequisites

前置依赖

Install the Databricks CLI (>= v0.288.0) if not already installed:

macOS:

brew tap databricks/tap && brew install databricks

Linux:

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

Windows:
```
winget install Databricks.DatabricksCLI
```

Verify:

databricks -v

如果还未安装Databricks CLI（>= v0.288.0）请先安装：

macOS:

brew tap databricks/tap && brew install databricks

Linux:

curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh

Windows:
```
winget install Databricks.DatabricksCLI
```

校验安装：

databricks -v

For AI Agents

给AI Agent的说明

Read the

databricks

skill for CLI basics, authentication, and deployment workflow. Read the

databricks-jobs

skill for job-specific guidance.

If skills are not available, install them:

databricks experimental aitools skills install

undefined

阅读

databricks

技能了解CLI基础、身份验证和部署工作流。阅读

databricks-jobs

技能获取作业相关的专属指引。

如果技能不可用，请先安装：

databricks experimental aitools skills install

undefined

Project Structure

项目结构

my-job-project/
├── databricks.yml              # Bundle configuration
├── resources/
│   └── my_job.job.yml          # Job definition
├── src/
│   ├── my_notebook.ipynb       # Notebook tasks
│   └── my_module/              # Python wheel package
│       ├── __init__.py
│       └── main.py
├── tests/
│   └── test_main.py
└── pyproject.toml               # Python project config (if using wheels)

my-job-project/
├── databricks.yml              # Bundle配置文件
├── resources/
│   └── my_job.job.yml          # 作业定义文件
├── src/
│   ├── my_notebook.ipynb       # Notebook任务
│   └── my_module/              # Python wheel包
│       ├── __init__.py
│       └── main.py
├── tests/
│   └── test_main.py
└── pyproject.toml               # Python项目配置（使用wheel时需要）

Configuring Tasks

配置任务

Edit

resources/<job_name>.job.yml

to configure tasks:

yaml

resources:
  jobs:
    my_job:
      name: my_job

      tasks:
        - task_key: my_notebook
          notebook_task:
            notebook_path: ../src/my_notebook.ipynb

        - task_key: my_python
          depends_on:
            - task_key: my_notebook
          python_wheel_task:
            package_name: my_package
            entry_point: main

Task types:

notebook_task

python_wheel_task

spark_python_task

pipeline_task

sql_task

编辑

resources/<job_name>.job.yml

文件来配置任务：

yaml

resources:
  jobs:
    my_job:
      name: my_job

      tasks:
        - task_key: my_notebook
          notebook_task:
            notebook_path: ../src/my_notebook.ipynb

        - task_key: my_python
          depends_on:
            - task_key: my_notebook
          python_wheel_task:
            package_name: my_package
            entry_point: main

支持的任务类型：

notebook_task

、

python_wheel_task

、

spark_python_task

、

pipeline_task

、

sql_task

Job Parameters

作业参数

Parameters defined at job level are passed to ALL tasks (no need to repeat per task):

yaml

resources:
  jobs:
    my_job:
      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${var.schema}

Access parameters in notebooks with

dbutils.widgets.get("catalog")

在作业层级定义的参数会传递给所有任务（无需在每个任务中重复定义）：

yaml

resources:
  jobs:
    my_job:
      parameters:
        - name: catalog
          default: ${var.catalog}
        - name: schema
          default: ${var.schema}

在notebook中通过

dbutils.widgets.get("catalog")

获取参数。

Writing Notebook Code

编写Notebook代码

python

undefined

python

undefined

Read parameters

读取参数

catalog = dbutils.widgets.get("catalog") schema = dbutils.widgets.get("schema")

Read tables

读取表

df = spark.read.table(f"{catalog}.{schema}.my_table")

SQL queries

SQL查询

result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")

Write output

写入输出

df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")

undefined

df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")

undefined

Scheduling

调度配置

yaml

resources:
  jobs:
    my_job:
      trigger:
        periodic:
          interval: 1
          unit: DAYS

Or with cron:

yaml

      schedule:
        quartz_cron_expression: "0 0 2 * * ?"
        timezone_id: "UTC"

yaml

resources:
  jobs:
    my_job:
      trigger:
        periodic:
          interval: 1
          unit: DAYS

也可以使用cron表达式配置：

yaml

      schedule:
        quartz_cron_expression: "0 0 2 * * ?"
        timezone_id: "UTC"

Multi-Task Jobs with Dependencies

带依赖的多任务作业

yaml

resources:
  jobs:
    my_pipeline_job:
      tasks:
        - task_key: extract
          notebook_task:
            notebook_path: ../src/extract.ipynb

        - task_key: transform
          depends_on:
            - task_key: extract
          notebook_task:
            notebook_path: ../src/transform.ipynb

        - task_key: load
          depends_on:
            - task_key: transform
          notebook_task:
            notebook_path: ../src/load.ipynb

yaml

resources:
  jobs:
    my_pipeline_job:
      tasks:
        - task_key: extract
          notebook_task:
            notebook_path: ../src/extract.ipynb

        - task_key: transform
          depends_on:
            - task_key: extract
          notebook_task:
            notebook_path: ../src/transform.ipynb

        - task_key: load
          depends_on:
            - task_key: transform
          notebook_task:
            notebook_path: ../src/load.ipynb

Unit Testing

单元测试

Run unit tests locally:

bash

uv run pytest

在本地运行单元测试：

bash

uv run pytest

Development Workflow

开发工作流

Validate:

databricks bundle validate --profile <profile>

Deploy:

databricks bundle deploy -t dev --profile <profile>

Run:

databricks bundle run <job_name> -t dev --profile <profile>

Check run status:

databricks jobs get-run --run-id <id> --profile <profile>

校验配置：

databricks bundle validate --profile <profile>

部署：

databricks bundle deploy -t dev --profile <profile>

运行：

databricks bundle run <job_name> -t dev --profile <profile>

查看运行状态：

databricks jobs get-run --run-id <id> --profile <profile>

databricks-jobs

Original

Translation

Lakeflow Jobs Development

Lakeflow Jobs开发

Scaffolding a New Job Project

新建作业项目脚手架

Databricks Asset Bundles Project

Databricks Asset Bundles项目

Prerequisites

前置依赖

For AI Agents

给AI Agent的说明

Project Structure

项目结构

Configuring Tasks

配置任务

Job Parameters

作业参数

Writing Notebook Code

编写Notebook代码

Read parameters

读取参数

Read tables

读取表

SQL queries

SQL查询

Write output

写入输出

Scheduling

调度配置

Multi-Task Jobs with Dependencies

带依赖的多任务作业

Unit Testing

单元测试

Development Workflow

开发工作流

Documentation

相关文档