databricks-ci-integration

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Databricks CI Integration

Databricks CI集成

Overview

概述

Set up CI/CD pipelines for Databricks using GitHub Actions and Asset Bundles.
使用GitHub Actions和Asset Bundles为Databricks设置CI/CD流水线。

Prerequisites

前提条件

  • GitHub repository with Actions enabled
  • Databricks workspace with service principal
  • Asset Bundles project structure
  • 已启用Actions的GitHub仓库
  • 包含服务主体(service principal)的Databricks工作区
  • Asset Bundles项目结构

Instructions

操作步骤

Step 1: Configure Service Principal

步骤1:配置服务主体

bash
undefined
bash
undefined

Create service principal in Databricks

Create service principal in Databricks

databricks service-principals create --json '{ "display_name": "GitHub Actions CI", "active": true }'
databricks service-principals create --json '{ "display_name": "GitHub Actions CI", "active": true }'

Note the application_id returned

Note the application_id returned

Create OAuth secret

Create OAuth secret

databricks service-principal-secrets create
--service-principal-id <application_id>
databricks service-principal-secrets create
--service-principal-id <application_id>

Grant permissions to service principal

Grant permissions to service principal

databricks permissions update workspace --json '{ "access_control_list": [{ "service_principal_name": "<application_id>", "permission_level": "CAN_MANAGE" }] }'
undefined
databricks permissions update workspace --json '{ "access_control_list": [{ "service_principal_name": "<application_id>", "permission_level": "CAN_MANAGE" }] }'
undefined

Step 2: Configure GitHub Secrets

步骤2:配置GitHub密钥

bash
undefined
bash
undefined

Set GitHub secrets

Set GitHub secrets

gh secret set DATABRICKS_HOST --body "https://adb-1234567890.1.azuredatabricks.net" gh secret set DATABRICKS_CLIENT_ID --body "your-client-id" gh secret set DATABRICKS_CLIENT_SECRET --body "your-client-secret"
gh secret set DATABRICKS_HOST --body "https://adb-1234567890.1.azuredatabricks.net" gh secret set DATABRICKS_CLIENT_ID --body "your-client-id" gh secret set DATABRICKS_CLIENT_SECRET --body "your-client-secret"

For staging/prod environments

For staging/prod environments

gh secret set DATABRICKS_HOST_STAGING --body "https://staging.azuredatabricks.net" gh secret set DATABRICKS_HOST_PROD --body "https://prod.azuredatabricks.net"
undefined
gh secret set DATABRICKS_HOST_STAGING --body "https://staging.azuredatabricks.net" gh secret set DATABRICKS_HOST_PROD --body "https://prod.azuredatabricks.net"
undefined

Step 3: Create GitHub Actions Workflow

步骤3:创建GitHub Actions工作流

yaml
undefined
yaml
undefined

.github/workflows/databricks-ci.yml

.github/workflows/databricks-ci.yml

name: Databricks CI/CD
on: push: branches: [main, develop] pull_request: branches: [main]
env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'
      cache: 'pip'

  - name: Install dependencies
    run: |
      pip install databricks-cli databricks-sdk pytest

  - name: Validate Asset Bundle
    run: databricks bundle validate

  - name: Run unit tests
    run: pytest tests/unit/ -v --tb=short
deploy-staging: needs: validate if: github.ref == 'refs/heads/develop' runs-on: ubuntu-latest environment: staging env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Staging
    run: |
      databricks bundle deploy -t staging

  - name: Run Integration Tests
    run: |
      # Trigger test job and wait for completion
      RUN_ID=$(databricks bundle run -t staging integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait
      # Check result
      RESULT=$(databricks runs get --run-id $RUN_ID | jq -r '.state.result_state')
      if [ "$RESULT" != "SUCCESS" ]; then
        echo "Integration tests failed!"
        exit 1
      fi
deploy-production: needs: [validate, deploy-staging] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: name: production url: ${{ secrets.DATABRICKS_HOST_PROD }} env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }} steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Production
    run: |
      databricks bundle deploy -t prod

  - name: Verify Deployment
    run: |
      databricks bundle summary -t prod
      # Trigger smoke test
      databricks bundle run -t prod smoke-test
undefined
name: Databricks CI/CD
on: push: branches: [main, develop] pull_request: branches: [main]
env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }}
jobs: validate: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'
      cache: 'pip'

  - name: Install dependencies
    run: |
      pip install databricks-cli databricks-sdk pytest

  - name: Validate Asset Bundle
    run: databricks bundle validate

  - name: Run unit tests
    run: pytest tests/unit/ -v --tb=short
deploy-staging: needs: validate if: github.ref == 'refs/heads/develop' runs-on: ubuntu-latest environment: staging env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Staging
    run: |
      databricks bundle deploy -t staging

  - name: Run Integration Tests
    run: |
      # Trigger test job and wait for completion
      RUN_ID=$(databricks bundle run -t staging integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait
      # Check result
      RESULT=$(databricks runs get --run-id $RUN_ID | jq -r '.state.result_state')
      if [ "$RESULT" != "SUCCESS" ]; then
        echo "Integration tests failed!"
        exit 1
      fi
deploy-production: needs: [validate, deploy-staging] if: github.ref == 'refs/heads/main' runs-on: ubuntu-latest environment: name: production url: ${{ secrets.DATABRICKS_HOST_PROD }} env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_PROD }} steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Deploy to Production
    run: |
      databricks bundle deploy -t prod

  - name: Verify Deployment
    run: |
      databricks bundle summary -t prod
      # Trigger smoke test
      databricks bundle run -t prod smoke-test
undefined

Step 4: PR Validation Workflow

步骤4:PR验证工作流

yaml
undefined
yaml
undefined

.github/workflows/pr-validation.yml

.github/workflows/pr-validation.yml

name: PR Validation
on: pull_request: branches: [main, develop]
jobs: lint-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install dependencies
    run: |
      pip install ruff mypy pytest pytest-cov databricks-sdk

  - name: Lint with ruff
    run: ruff check src/

  - name: Type check with mypy
    run: mypy src/ --ignore-missing-imports

  - name: Run tests with coverage
    run: pytest tests/unit/ --cov=src --cov-report=xml

  - name: Upload coverage
    uses: codecov/codecov-action@v4
    with:
      files: coverage.xml
bundle-validation: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4
  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Validate bundle for all targets
    run: |
      databricks bundle validate -t dev
      databricks bundle validate -t staging
      databricks bundle validate -t prod

  - name: Check for breaking changes
    run: |
      # Compare job configurations
      databricks bundle summary -t prod --output json > current.json
      # Add logic to detect breaking changes
undefined
name: PR Validation
on: pull_request: branches: [main, develop]
jobs: lint-and-test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4
  - name: Setup Python
    uses: actions/setup-python@v5
    with:
      python-version: '3.11'

  - name: Install dependencies
    run: |
      pip install ruff mypy pytest pytest-cov databricks-sdk

  - name: Lint with ruff
    run: ruff check src/

  - name: Type check with mypy
    run: mypy src/ --ignore-missing-imports

  - name: Run tests with coverage
    run: pytest tests/unit/ --cov=src --cov-report=xml

  - name: Upload coverage
    uses: codecov/codecov-action@v4
    with:
      files: coverage.xml
bundle-validation: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4
  - name: Install Databricks CLI
    run: pip install databricks-cli

  - name: Validate bundle for all targets
    run: |
      databricks bundle validate -t dev
      databricks bundle validate -t staging
      databricks bundle validate -t prod

  - name: Check for breaking changes
    run: |
      # Compare job configurations
      databricks bundle summary -t prod --output json > current.json
      # Add logic to detect breaking changes
undefined

Step 5: Nightly Test Workflow

步骤5:夜间测试工作流

yaml
undefined
yaml
undefined

.github/workflows/nightly-tests.yml

.github/workflows/nightly-tests.yml

name: Nightly Tests
on: schedule: - cron: '0 2 * * *' # 2 AM UTC daily workflow_dispatch:
jobs: integration-tests: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4
  - name: Install dependencies
    run: pip install databricks-cli

  - name: Run full integration test suite
    run: |
      databricks bundle deploy -t staging
      RUN_ID=$(databricks bundle run -t staging full-integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait

  - name: Generate test report
    if: always()
    run: |
      # Download test results
      databricks fs cp dbfs:/test-results/latest/ ./test-results/ --recursive

  - name: Upload test artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
      name: test-results
      path: test-results/

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      channel-id: 'data-engineering-alerts'
      slack-message: 'Nightly tests failed! Check ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
    env:
      SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
undefined
name: Nightly Tests
on: schedule: - cron: '0 2 * * *' # 2 AM UTC daily workflow_dispatch:
jobs: integration-tests: runs-on: ubuntu-latest env: DATABRICKS_HOST: ${{ secrets.DATABRICKS_HOST_STAGING }} DATABRICKS_CLIENT_ID: ${{ secrets.DATABRICKS_CLIENT_ID }} DATABRICKS_CLIENT_SECRET: ${{ secrets.DATABRICKS_CLIENT_SECRET }} steps: - uses: actions/checkout@v4
  - name: Install dependencies
    run: pip install databricks-cli

  - name: Run full integration test suite
    run: |
      databricks bundle deploy -t staging
      RUN_ID=$(databricks bundle run -t staging full-integration-tests | jq -r '.run_id')
      databricks runs get --run-id $RUN_ID --wait

  - name: Generate test report
    if: always()
    run: |
      # Download test results
      databricks fs cp dbfs:/test-results/latest/ ./test-results/ --recursive

  - name: Upload test artifacts
    uses: actions/upload-artifact@v4
    if: always()
    with:
      name: test-results
      path: test-results/

  - name: Notify on failure
    if: failure()
    uses: slackapi/slack-github-action@v1
    with:
      channel-id: 'data-engineering-alerts'
      slack-message: 'Nightly tests failed! Check ${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}'
    env:
      SLACK_BOT_TOKEN: ${{ secrets.SLACK_BOT_TOKEN }}
undefined

Output

输出结果

  • Automated test pipeline
  • PR checks configured
  • Staging deployment on merge to develop
  • Production deployment on merge to main
  • 自动化测试流水线
  • 已配置的PR检查
  • 合并到develop分支时自动部署到预生产环境(staging)
  • 合并到main分支时自动部署到生产环境(production)

Error Handling

错误处理

IssueCauseSolution
Auth failedInvalid credentialsRegenerate service principal secret
Bundle validation failedInvalid YAMLRun
databricks bundle validate
locally
Deployment timeoutSlow cluster startupUse warm pools or increase timeout
Tests failedCode regressionFix code and re-run
问题原因解决方案
认证失败凭据无效重新生成服务主体密钥
Bundle验证失败YAML无效本地运行
databricks bundle validate
检查
部署超时集群启动缓慢使用预热池或增加超时时间
测试失败代码回归修复代码后重新运行

Examples

示例

Matrix Testing (Multiple DBR Versions)

矩阵测试(多DBR版本)

yaml
jobs:
  test-matrix:
    strategy:
      matrix:
        dbr_version: ['13.3', '14.3', '15.1']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Test on DBR ${{ matrix.dbr_version }}
        run: |
          databricks bundle deploy -t test-${{ matrix.dbr_version }}
          databricks bundle run -t test-${{ matrix.dbr_version }} tests
yaml
jobs:
  test-matrix:
    strategy:
      matrix:
        dbr_version: ['13.3', '14.3', '15.1']
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Test on DBR ${{ matrix.dbr_version }}
        run: |
          databricks bundle deploy -t test-${{ matrix.dbr_version }}
          databricks bundle run -t test-${{ matrix.dbr_version }} tests

Branch Protection Rules

分支保护规则

yaml
undefined
yaml
undefined

Set via GitHub API or UI

Set via GitHub API or UI

required_status_checks:
  • "lint-and-test"
  • "bundle-validation" required_reviews: 1 dismiss_stale_reviews: true
undefined
required_status_checks:
  • "lint-and-test"
  • "bundle-validation" required_reviews: 1 dismiss_stale_reviews: true
undefined

Resources

参考资源

Next Steps

下一步

For deployment patterns, see
databricks-deploy-integration
.
如需了解部署模式,请查看
databricks-deploy-integration