Loading...
Loading...
Develop and deploy Lakeflow Jobs on Databricks. Use when creating data engineering jobs with notebooks, Python wheels, or SQL tasks. Invoke BEFORE starting implementation.
npx skill4agent add databricks/databricks-agent-skills databricks-jobsdatabricksdatabricks bundle init<project_name>/databricks bundle init default-python --config-file <(echo '{"project_name": "my_job", "include_job": "yes", "include_pipeline": "no", "include_python": "yes", "serverless": "yes"}') --profile <PROFILE> < /dev/nullproject_nameCLAUDE.mdAGENTS.md# Databricks Asset Bundles Project
This project uses Databricks Asset Bundles for deployment.
## Prerequisites
Install the Databricks CLI (>= v0.288.0) if not already installed:
- macOS: `brew tap databricks/tap && brew install databricks`
- Linux: `curl -fsSL https://raw.githubusercontent.com/databricks/setup-cli/main/install.sh | sh`
- Windows: `winget install Databricks.DatabricksCLI`
Verify: `databricks -v`
## For AI Agents
Read the `databricks` skill for CLI basics, authentication, and deployment workflow.
Read the `databricks-jobs` skill for job-specific guidance.
If skills are not available, install them: `databricks experimental aitools skills install`my-job-project/
├── databricks.yml # Bundle configuration
├── resources/
│ └── my_job.job.yml # Job definition
├── src/
│ ├── my_notebook.ipynb # Notebook tasks
│ └── my_module/ # Python wheel package
│ ├── __init__.py
│ └── main.py
├── tests/
│ └── test_main.py
└── pyproject.toml # Python project config (if using wheels)resources/<job_name>.job.ymlresources:
jobs:
my_job:
name: my_job
tasks:
- task_key: my_notebook
notebook_task:
notebook_path: ../src/my_notebook.ipynb
- task_key: my_python
depends_on:
- task_key: my_notebook
python_wheel_task:
package_name: my_package
entry_point: mainnotebook_taskpython_wheel_taskspark_python_taskpipeline_tasksql_taskresources:
jobs:
my_job:
parameters:
- name: catalog
default: ${var.catalog}
- name: schema
default: ${var.schema}dbutils.widgets.get("catalog")# Read parameters
catalog = dbutils.widgets.get("catalog")
schema = dbutils.widgets.get("schema")
# Read tables
df = spark.read.table(f"{catalog}.{schema}.my_table")
# SQL queries
result = spark.sql(f"SELECT * FROM {catalog}.{schema}.my_table LIMIT 10")
# Write output
df.write.mode("overwrite").saveAsTable(f"{catalog}.{schema}.output_table")resources:
jobs:
my_job:
trigger:
periodic:
interval: 1
unit: DAYS schedule:
quartz_cron_expression: "0 0 2 * * ?"
timezone_id: "UTC"resources:
jobs:
my_pipeline_job:
tasks:
- task_key: extract
notebook_task:
notebook_path: ../src/extract.ipynb
- task_key: transform
depends_on:
- task_key: extract
notebook_task:
notebook_path: ../src/transform.ipynb
- task_key: load
depends_on:
- task_key: transform
notebook_task:
notebook_path: ../src/load.ipynbuv run pytestdatabricks bundle validate --profile <profile>databricks bundle deploy -t dev --profile <profile>databricks bundle run <job_name> -t dev --profile <profile>databricks jobs get-run --run-id <id> --profile <profile>