Databricks Lakeflow Jobs

Overview

Databricks Jobs orchestrate data workflows with multi-task DAGs, flexible triggers, and comprehensive monitoring. Jobs support diverse task types and can be managed via Python SDK, CLI, or Asset Bundles.

Reference Files

Use Case	Reference File
Configure task types (notebook, Python, SQL, dbt, etc.)	task-types.md
Set up triggers and schedules	triggers-schedules.md
Configure notifications and health monitoring	notifications-monitoring.md
Complete working examples	examples.md

Quick Start

Python SDK

python

from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import Task, NotebookTask, Source

w = WorkspaceClient()

job = w.jobs.create(
    name="my-etl-job",
    tasks=[
        Task(
            task_key="extract",
            notebook_task=NotebookTask(
                notebook_path="/Workspace/Users/user@example.com/extract",
                source=Source.WORKSPACE
            )
        )
    ]
)
print(f"Created job: {job.job_id}")

CLI

bash

databricks jobs create --json '{
  "name": "my-etl-job",
  "tasks": [{
    "task_key": "extract",
    "notebook_task": {
      "notebook_path": "/Workspace/Users/user@example.com/extract",
      "source": "WORKSPACE"
    }
  }]
}'

Asset Bundles (DABs)

yaml

# resources/jobs.yml
resources:
  jobs:
    my_etl_job:
      name: "[${bundle.target}] My ETL Job"
      tasks:
        - task_key: extract
          notebook_task:
            notebook_path: ../src/notebooks/extract.py

Core Concepts

Multi-Task Workflows

Jobs support DAG-based task dependencies:

yaml

tasks:
  - task_key: extract
    notebook_task:
      notebook_path: ../src/extract.py

  - task_key: transform
    depends_on:
      - task_key: extract
    notebook_task:
      notebook_path: ../src/transform.py

  - task_key: load
    depends_on:
      - task_key: transform
    run_if: ALL_SUCCESS  # Only run if all dependencies succeed
    notebook_task:
      notebook_path: ../src/load.py

run_if conditions:

```
ALL_SUCCESS
```
(default) - Run when all dependencies succeed
```
ALL_DONE
```
- Run when all dependencies complete (success or failure)
```
AT_LEAST_ONE_SUCCESS
```
- Run when at least one dependency succeeds
```
NONE_FAILED
```
- Run when no dependencies failed
```
ALL_FAILED
```
- Run when all dependencies failed
```
AT_LEAST_ONE_FAILED
```
- Run when at least one dependency failed

Task Types Summary

Task Type	Use Case	Reference
`notebook_task`	Run notebooks	task-types.md#notebook-task
`spark_python_task`	Run Python scripts	task-types.md#spark-python-task
`python_wheel_task`	Run Python wheels	task-types.md#python-wheel-task
`sql_task`	Run SQL queries/files	task-types.md#sql-task
`dbt_task`	Run dbt projects	task-types.md#dbt-task
`pipeline_task`	Trigger DLT/SDP pipelines	task-types.md#pipeline-task
`spark_jar_task`	Run Spark JARs	task-types.md#spark-jar-task
`run_job_task`	Trigger other jobs	task-types.md#run-job-task
`for_each_task`	Loop over inputs	task-types.md#for-each-task

Trigger Types Summary

Trigger Type	Use Case	Reference
`schedule`	Cron-based scheduling	triggers-schedules.md#cron-schedule
`trigger.periodic`	Interval-based	triggers-schedules.md#periodic-trigger
`trigger.file_arrival`	File arrival events	triggers-schedules.md#file-arrival-trigger
`trigger.table_update`	Table change events	triggers-schedules.md#table-update-trigger
`continuous`	Always-running jobs	triggers-schedules.md#continuous-jobs

Compute Configuration

Job Clusters (Recommended)

Define reusable cluster configurations:

yaml

job_clusters:
  - job_cluster_key: shared_cluster
    new_cluster:
      spark_version: "15.4.x-scala2.12"
      node_type_id: "i3.xlarge"
      num_workers: 2
      spark_conf:
        spark.speculation: "true"

tasks:
  - task_key: my_task
    job_cluster_key: shared_cluster
    notebook_task:
      notebook_path: ../src/notebook.py

Autoscaling Clusters

yaml

new_cluster:
  spark_version: "15.4.x-scala2.12"
  node_type_id: "i3.xlarge"
  autoscale:
    min_workers: 2
    max_workers: 8

Existing Cluster

yaml

tasks:
  - task_key: my_task
    existing_cluster_id: "0123-456789-abcdef12"
    notebook_task:
      notebook_path: ../src/notebook.py

Serverless Compute

For notebook and Python tasks, omit cluster configuration to use serverless:

yaml

tasks:
  - task_key: serverless_task
    notebook_task:
      notebook_path: ../src/notebook.py
    # No cluster config = serverless

Job Parameters

Define Parameters

yaml

parameters:
  - name: env
    default: "dev"
  - name: date
    default: "{{start_date}}"  # Dynamic value reference

Access in Notebook

python

# In notebook
dbutils.widgets.get("env")
dbutils.widgets.get("date")

Pass to Tasks

yaml

tasks:
  - task_key: my_task
    notebook_task:
      notebook_path: ../src/notebook.py
      base_parameters:
        env: "{{job.parameters.env}}"
        custom_param: "value"

Common Operations

Python SDK Operations

python

from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

# List jobs
jobs = w.jobs.list()

# Get job details
job = w.jobs.get(job_id=12345)

# Run job now
run = w.jobs.run_now(job_id=12345)

# Run with parameters
run = w.jobs.run_now(
    job_id=12345,
    job_parameters={"env": "prod", "date": "2024-01-15"}
)

# Cancel run
w.jobs.cancel_run(run_id=run.run_id)

# Delete job
w.jobs.delete(job_id=12345)

CLI Operations

bash

# List jobs
databricks jobs list

# Get job details
databricks jobs get 12345

# Run job
databricks jobs run-now 12345

# Run with parameters
databricks jobs run-now 12345 --job-params '{"env": "prod"}'

# Cancel run
databricks jobs cancel-run 67890

# Delete job
databricks jobs delete 12345

Asset Bundle Operations

bash

# Validate configuration
databricks bundle validate

# Deploy job
databricks bundle deploy

# Run job
databricks bundle run my_job_resource_key

# Deploy to specific target
databricks bundle deploy -t prod

# Destroy resources
databricks bundle destroy

Permissions (DABs)

yaml

resources:
  jobs:
    my_job:
      name: "My Job"
      permissions:
        - level: CAN_VIEW
          group_name: "data-analysts"
        - level: CAN_MANAGE_RUN
          group_name: "data-engineers"
        - level: CAN_MANAGE
          user_name: "admin@example.com"

Permission levels:

```
CAN_VIEW
```
- View job and run history
```
CAN_MANAGE_RUN
```
- View, trigger, and cancel runs
```
CAN_MANAGE
```
- Full control including edit and delete

Common Issues

Issue	Solution
Job cluster startup slow	Use job clusters with `job_cluster_key` for reuse across tasks
Task dependencies not working	Verify `task_key` references match exactly in `depends_on`
Schedule not triggering	Check `pause_status: UNPAUSED` and valid timezone
File arrival not detecting	Ensure path has proper permissions and uses cloud storage URL
Table update trigger missing events	Verify Unity Catalog table and proper grants
Parameter not accessible	Use `dbutils.widgets.get()` in notebooks
"admins" group error	Cannot modify admins permissions on jobs
Serverless task fails	Ensure task type supports serverless (notebook, Python)

Related Skills

asset-bundles - Deploy jobs via Databricks Asset Bundles
spark-declarative-pipelines - Configure pipelines triggered by jobs

databricks-jobs

NPX Install

Tags

SKILL.md Content

Databricks Lakeflow Jobs

Overview

Reference Files

Quick Start

Python SDK

CLI

Asset Bundles (DABs)

Core Concepts

Multi-Task Workflows

Task Types Summary

Trigger Types Summary

Compute Configuration

Job Clusters (Recommended)

Autoscaling Clusters

Existing Cluster

Serverless Compute

Job Parameters

Define Parameters

Access in Notebook

Pass to Tasks

Common Operations

Python SDK Operations

CLI Operations

Asset Bundle Operations

Permissions (DABs)

Common Issues

Related Skills

Resources