Databricks CI Integration

Databricks CI集成

Overview

概述

Set up CI/CD pipelines for Databricks using GitHub Actions and Asset Bundles.

使用GitHub Actions和Asset Bundles为Databricks设置CI/CD流水线。

Prerequisites

前提条件

GitHub repository with Actions enabled
Databricks workspace with service principal
Asset Bundles project structure

已启用Actions的GitHub仓库
包含服务主体（service principal）的Databricks工作区
Asset Bundles项目结构

Instructions

操作步骤

Step 1: Configure Service Principal

步骤1：配置服务主体

bash

undefined

bash

undefined

Create service principal in Databricks

databricks service-principals create --json '{ "display_name": "GitHub Actions CI", "active": true }'

Note the application_id returned

Create OAuth secret

databricks service-principal-secrets create
--service-principal-id <application_id>

Grant permissions to service principal

databricks permissions update workspace --json '{ "access_control_list": [{ "service_principal_name": "<application_id>", "permission_level": "CAN_MANAGE" }] }'

undefined

databricks permissions update workspace --json '{ "access_control_list": [{ "service_principal_name": "<application_id>", "permission_level": "CAN_MANAGE" }] }'

undefined

Step 2: Configure GitHub Secrets

步骤2：配置GitHub密钥

bash

undefined

bash

undefined

Set GitHub secrets

gh secret set DATABRICKS_HOST --body "https://adb-1234567890.1.azuredatabricks.net" gh secret set DATABRICKS_CLIENT_ID --body "your-client-id" gh secret set DATABRICKS_CLIENT_SECRET --body "your-client-secret"

For staging/prod environments

gh secret set DATABRICKS_HOST_STAGING --body "https://staging.azuredatabricks.net" gh secret set DATABRICKS_HOST_PROD --body "https://prod.azuredatabricks.net"

undefined

gh secret set DATABRICKS_HOST_STAGING --body "https://staging.azuredatabricks.net" gh secret set DATABRICKS_HOST_PROD --body "https://prod.azuredatabricks.net"

undefined

Step 3: Create GitHub Actions Workflow

步骤3：创建GitHub Actions工作流

yaml

undefined

yaml

undefined

.github/workflows/databricks-ci.yml

on: push: branches: [main, develop] pull_request: branches: [main]

env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}

jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'
      cache: 'pip'

  - name: Install dependencies
    run: |
      pip install databricks-cli databricks-sdk pytest

  - name: Validate Asset Bundle
    run: databricks bundle validate

  - name: Run unit tests
    run: pytest tests/unit/ -v --tb=short

deploy-staging: needs: validate if: github.ref == 'refs/heads/develop' runs-on: ubuntu-latest environment: staging env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Staging
    run: |
      databricks bundle deploy -t staging

  - name: Run Integration Tests
    run: |
      # Trigger test job and wait for completion
      RUN_ID=$(databricks bundle run -t staging integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait
      # Check result
      RESULT=$(databricks runs get --run-id $RUN_ID | jq -r '.state.result_state')
      if [ "$RESULT" != "SUCCESS" ]; then
        echo "Integration tests failed!"
        exit 1
      fi

deploy-production: needs: [validate, deploy-staging] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: name: production url: ${{ secrets.DATABRICKS_HOST_PROD }} env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }} steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Production
    run: |
      databricks bundle deploy -t prod

  - name: Verify Deployment
    run: |
      databricks bundle summary -t prod
      # Trigger smoke test
      databricks bundle run -t prod smoke-test

undefined

on: push: branches: [main, develop] pull_request: branches: [main]

env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}

jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'
      cache: 'pip'

  - name: Install dependencies
    run: |
      pip install databricks-cli databricks-sdk pytest

  - name: Validate Asset Bundle
    run: databricks bundle validate

  - name: Run unit tests
    run: pytest tests/unit/ -v --tb=short

deploy-staging: needs: validate if: github.ref == 'refs/heads/develop' runs-on: ubuntu-latest environment: staging env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Staging
    run: |
      databricks bundle deploy -t staging

  - name: Run Integration Tests
    run: |
      # Trigger test job and wait for completion
      RUN_ID=$(databricks bundle run -t staging integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait
      # Check result
      RESULT=$(databricks runs get --run-id $RUN_ID | jq -r '.state.result_state')
      if [ "$RESULT" != "SUCCESS" ]; then
        echo "Integration tests failed!"
        exit 1
      fi

deploy-production: needs: [validate, deploy-staging] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: name: production url: ${{ secrets.DATABRICKS_HOST_PROD }} env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }} steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Production
    run: |
      databricks bundle deploy -t prod

  - name: Verify Deployment
    run: |
      databricks bundle summary -t prod
      # Trigger smoke test
      databricks bundle run -t prod smoke-test

undefined

Step 4: PR Validation Workflow

步骤4：PR验证工作流

yaml

undefined

yaml

undefined

.github/workflows/pr-validation.yml

on: pull_request: branches: [main, develop]

jobs: lint-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install dependencies
    run: |
      pip install ruff mypy pytest pytest-cov databricks-sdk

  - name: Lint with ruff
    run: ruff check src/

  - name: Type check with mypy
    run: mypy src/ --ignore-missing-imports

  - name: Run tests with coverage
    run: pytest tests/unit/ --cov=src --cov-report=xml

  - name: Upload coverage
    uses: codecov/codecov-action@v4
    with:
      files: coverage.xml

bundle-validation: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Validate bundle for all targets
    run: |
      databricks bundle validate -t dev
      databricks bundle validate -t staging
      databricks bundle validate -t prod

  - name: Check for breaking changes
    run: |
      # Compare job configurations
      databricks bundle summary -t prod --output json > current.json
      # Add logic to detect breaking changes

undefined

on: pull_request: branches: [main, develop]

jobs: lint-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4

  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install dependencies
    run: |
      pip install ruff mypy pytest pytest-cov databricks-sdk

  - name: Lint with ruff
    run: ruff check src/

  - name: Type check with mypy
    run: mypy src/ --ignore-missing-imports

  - name: Run tests with coverage
    run: pytest tests/unit/ --cov=src --cov-report=xml

  - name: Upload coverage
    uses: codecov/codecov-action@v4
    with:
      files: coverage.xml

bundle-validation: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Validate bundle for all targets
    run: |
      databricks bundle validate -t dev
      databricks bundle validate -t staging
      databricks bundle validate -t prod

  - name: Check for breaking changes
    run: |
      # Compare job configurations
      databricks bundle summary -t prod --output json > current.json
      # Add logic to detect breaking changes

undefined

Step 5: Nightly Test Workflow

步骤5：夜间测试工作流

yaml

undefined

yaml

undefined

.github/workflows/nightly-tests.yml

on: schedule: - cron: '0 2 * * *' # 2 AM UTC daily workflow_dispatch:

jobs: integration-tests: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4

  - name: Install dependencies
    run: pip install databricks-cli

  - name: Run full integration test suite
    run: |
      databricks bundle deploy -t staging
      RUN_ID=$(databricks bundle run -t staging full-integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait

  - name: Generate test report
    if: always()
    run: |
      # Download test results
      databricks fs cp dbfs:/test-results/latest/ ./test-results/ --recursive

  - name: Upload test artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
      name: test-results
      path: test-results/

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      channel-id: 'data-engineering-alerts'
      slack-message: 'Nightly tests failed! Check ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
    env:
      SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

undefined

on: schedule: - cron: '0 2 * * *' # 2 AM UTC daily workflow_dispatch:

jobs: integration-tests: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4

  - name: Install dependencies
    run: pip install databricks-cli

  - name: Run full integration test suite
    run: |
      databricks bundle deploy -t staging
      RUN_ID=$(databricks bundle run -t staging full-integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait

  - name: Generate test report
    if: always()
    run: |
      # Download test results
      databricks fs cp dbfs:/test-results/latest/ ./test-results/ --recursive

  - name: Upload test artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
      name: test-results
      path: test-results/

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      channel-id: 'data-engineering-alerts'
      slack-message: 'Nightly tests failed! Check ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
    env:
      SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}

undefined

Output

输出结果

Automated test pipeline
PR checks configured
Staging deployment on merge to develop
Production deployment on merge to main

自动化测试流水线
已配置的PR检查
合并到develop分支时自动部署到预生产环境（staging）
合并到main分支时自动部署到生产环境（production）

Error Handling

错误处理

Issue	Cause	Solution
Auth failed	Invalid credentials	Regenerate service principal secret
Bundle validation failed	Invalid YAML	Run `databricks bundle validate` locally
Deployment timeout	Slow cluster startup	Use warm pools or increase timeout
Tests failed	Code regression	Fix code and re-run

问题	原因	解决方案
认证失败	凭据无效	重新生成服务主体密钥
Bundle验证失败	YAML无效	本地运行 `databricks bundle validate` 检查
部署超时	集群启动缓慢	使用预热池或增加超时时间
测试失败	代码回归	修复代码后重新运行

Examples

示例

Matrix Testing (Multiple DBR Versions)

矩阵测试（多DBR版本）

yaml

jobs:
  test-matrix:
    strategy:
      matrix:
        dbr_version: ['13.3', '14.3', '15.1']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Test on DBR ${{ matrix.dbr_version }}
        run: |
          databricks bundle deploy -t test-${{ matrix.dbr_version }}
          databricks bundle run -t test-${{ matrix.dbr_version }} tests

yaml

jobs:
  test-matrix:
    strategy:
      matrix:
        dbr_version: ['13.3', '14.3', '15.1']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Test on DBR ${{ matrix.dbr_version }}
        run: |
          databricks bundle deploy -t test-${{ matrix.dbr_version }}
          databricks bundle run -t test-${{ matrix.dbr_version }} tests

Branch Protection Rules

分支保护规则

yaml

undefined

yaml

undefined

Set via GitHub API or UI

required_status_checks:

"lint-and-test"
"bundle-validation" required_reviews: 1 dismiss_stale_reviews: true

undefined

required_status_checks:

"lint-and-test"
"bundle-validation" required_reviews: 1 dismiss_stale_reviews: true

undefined

Resources

参考资源

Next Steps

下一步

For deployment patterns, see

databricks-deploy-integration

.

如需了解部署模式，请查看

databricks-deploy-integration

。

databricks-ci-integration

Original

Translation

Databricks CI Integration

Databricks CI集成

Overview

概述

Prerequisites

前提条件

Instructions

操作步骤

Step 1: Configure Service Principal

步骤1：配置服务主体

Create service principal in Databricks

Create service principal in Databricks

Note the application_id returned

Note the application_id returned

Create OAuth secret

Create OAuth secret

Grant permissions to service principal

Grant permissions to service principal

Step 2: Configure GitHub Secrets

步骤2：配置GitHub密钥

Set GitHub secrets

Set GitHub secrets

For staging/prod environments

For staging/prod environments

Step 3: Create GitHub Actions Workflow

步骤3：创建GitHub Actions工作流

.github/workflows/databricks-ci.yml

.github/workflows/databricks-ci.yml

Step 4: PR Validation Workflow

步骤4：PR验证工作流

.github/workflows/pr-validation.yml

.github/workflows/pr-validation.yml

Step 5: Nightly Test Workflow

步骤5：夜间测试工作流

.github/workflows/nightly-tests.yml

.github/workflows/nightly-tests.yml

Output

输出结果

Error Handling

错误处理

Examples

示例

Matrix Testing (Multiple DBR Versions)

矩阵测试（多DBR版本）

Branch Protection Rules

分支保护规则

Set via GitHub API or UI

Set via GitHub API or UI

Resources

参考资源

Next Steps

下一步