databricks-jobs
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLakeflow Jobs Development
Lakeflow Jobs开发
FIRST: Use the parent skill for CLI basics, authentication, profile selection, and data exploration commands.
databricksLakeflow Jobs are scheduled workflows that run notebooks, Python scripts, SQL queries, and other tasks on Databricks.
首要步骤:使用父级技能掌握CLI基础操作、身份验证、配置文件选择和数据探索相关命令。
databricksLakeflow Jobs是运行在Databricks上的可调度工作流,支持执行notebook、Python脚本、SQL查询及其他类型的任务。
Scaffolding a New Job Project
新建作业项目脚手架
Use with a config file to scaffold non-interactively. This creates a project in the directory:
databricks bundle init<project_name>/bash
databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null- : letters, numbers, underscores only
project_name
After scaffolding, create and in the project directory. These files are essential to provide agents with guidance on how to work with the project. Use this content:
CLAUDE.mdAGENTS.mdundefined使用带配置文件的命令可以非交互方式生成项目脚手架,会在目录下创建项目:
databricks bundle init<project_name>/bash
databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/null- :仅支持字母、数字和下划线
project_name
脚手架生成完成后,请在项目目录下创建和文件,这两个文件是为Agent提供项目操作指引的必要文件,可使用以下内容:
CLAUDE.mdAGENTS.mdundefinedDatabricks Asset Bundles Project
Databricks Asset Bundles项目
This project uses Databricks Asset Bundles for deployment.
本项目使用Databricks Asset Bundles进行部署。
Prerequisites
前置依赖
Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS:
brew tap databricks/tap && brew install databricks - Linux:
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh - Windows:
winget install Databricks.DatabricksCLI
Verify:
databricks -v如果还未安装Databricks CLI(>= v0.288.0)请先安装:
- macOS:
brew tap databricks/tap && brew install databricks - Linux:
curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh - Windows:
winget install Databricks.DatabricksCLI
校验安装:
databricks -vFor AI Agents
给AI Agent的说明
Read the skill for CLI basics, authentication, and deployment workflow.
Read the skill for job-specific guidance.
databricksdatabricks-jobsIf skills are not available, install them:
databricks experimental aitools skills installundefined阅读技能了解CLI基础、身份验证和部署工作流。
阅读技能获取作业相关的专属指引。
databricksdatabricks-jobs如果技能不可用,请先安装:
databricks experimental aitools skills installundefinedProject Structure
项目结构
my-job-project/
├── databricks.yml # Bundle configuration
├── resources/
│ └── my_job.job.yml # Job definition
├── src/
│ ├── my_notebook.ipynb # Notebook tasks
│ └── my_module/ # Python wheel package
│ ├── __init__.py
│ └── main.py
├── tests/
│ └── test_main.py
└── pyproject.toml # Python project config (if using wheels)my-job-project/
├── databricks.yml # Bundle配置文件
├── resources/
│ └── my_job.job.yml # 作业定义文件
├── src/
│ ├── my_notebook.ipynb # Notebook任务
│ └── my_module/ # Python wheel包
│ ├── __init__.py
│ └── main.py
├── tests/
│ └── test_main.py
└── pyproject.toml # Python项目配置(使用wheel时需要)Configuring Tasks
配置任务
Edit to configure tasks:
resources/<job_name>.job.ymlyaml
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: my_notebook
notebook_task:
notebook_path: ../src/my_notebook.ipynb
- task_key: my_python
depends_on:
- task_key: my_notebook
python_wheel_task:
package_name: my_package
entry_point: mainTask types: , , , ,
notebook_taskpython_wheel_taskspark_python_taskpipeline_tasksql_task编辑文件来配置任务:
resources/<job_name>.job.ymlyaml
resources:
jobs:
my_job:
name: my_job
tasks:
- task_key: my_notebook
notebook_task:
notebook_path: ../src/my_notebook.ipynb
- task_key: my_python
depends_on:
- task_key: my_notebook
python_wheel_task:
package_name: my_package
entry_point: main支持的任务类型:、、、、
notebook_taskpython_wheel_taskspark_python_taskpipeline_tasksql_taskJob Parameters
作业参数
Parameters defined at job level are passed to ALL tasks (no need to repeat per task):
yaml
resources:
jobs:
my_job:
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}Access parameters in notebooks with .
dbutils.widgets.get("catalog")在作业层级定义的参数会传递给所有任务(无需在每个任务中重复定义):
yaml
resources:
jobs:
my_job:
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}在notebook中通过获取参数。
dbutils.widgets.get("catalog")Writing Notebook Code
编写Notebook代码
python
undefinedpython
undefinedRead parameters
读取参数
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")
Read tables
读取表
df = spark.read.table(f"{catalog}.{schema}.my_table")
df = spark.read.table(f"{catalog}.{schema}.my_table")
SQL queries
SQL查询
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")
Write output
写入输出
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")
undefineddf.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")
undefinedScheduling
调度配置
yaml
resources:
jobs:
my_job:
trigger:
periodic:
interval: 1
unit: DAYSOr with cron:
yaml
schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"yaml
resources:
jobs:
my_job:
trigger:
periodic:
interval: 1
unit: DAYS也可以使用cron表达式配置:
yaml
schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"Multi-Task Jobs with Dependencies
带依赖的多任务作业
yaml
resources:
jobs:
my_pipeline_job:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.ipynb
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.ipynb
- task_key: load
depends_on:
- task_key: transform
notebook_task:
notebook_path: ../src/load.ipynbyaml
resources:
jobs:
my_pipeline_job:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.ipynb
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.ipynb
- task_key: load
depends_on:
- task_key: transform
notebook_task:
notebook_path: ../src/load.ipynbUnit Testing
单元测试
Run unit tests locally:
bash
uv run pytest在本地运行单元测试:
bash
uv run pytestDevelopment Workflow
开发工作流
- Validate:
databricks bundle validate --profile <profile> - Deploy:
databricks bundle deploy -t dev --profile <profile> - Run:
databricks bundle run <job_name> -t dev --profile <profile> - Check run status:
databricks jobs get-run --run-id <id> --profile <profile>
- 校验配置:
databricks bundle validate --profile <profile> - 部署:
databricks bundle deploy -t dev --profile <profile> - 运行:
databricks bundle run <job_name> -t dev --profile <profile> - 查看运行状态:
databricks jobs get-run --run-id <id> --profile <profile>
Documentation
相关文档
- Lakeflow Jobs: https://docs.databricks.com/jobs
- Task types: https://docs.databricks.com/jobs/configure-task
- Databricks Asset Bundles: https://docs.databricks.com/dev-tools/bundles/examples
- Lakeflow Jobs: https://docs.databricks.com/jobs
- 任务类型: https://docs.databricks.com/jobs/configure-task
- Databricks Asset Bundles: https://docs.databricks.com/dev-tools/bundles/examples