Databricks Debug Bundle

Databricks调试包

Overview

概述

Collect all necessary diagnostic information for Databricks support tickets.

为Databricks支持工单收集所有必要的诊断信息。

Prerequisites

前提条件

Databricks CLI installed and configured
Access to cluster logs (admin or cluster owner)
Permission to access job run details

已安装并配置Databricks CLI
有权限访问集群日志（管理员或集群所有者）
有权限访问作业运行详情

Instructions

操作步骤

Step 1: Create Debug Bundle Script

步骤1：创建调试包脚本

bash

#!/bin/bash

bash

#!/bin/bash

databricks-debug-bundle.sh

set -e BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)" mkdir -p "$BUNDLE_DIR"

echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt" echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt" echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

undefined

set -e BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)" mkdir -p "$BUNDLE_DIR"

echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt" echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt" echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

undefined

Step 2: Collect Environment Info

步骤2：收集环境信息

bash

undefined

bash

undefined

Environment info

echo "--- Environment ---" >> "$BUNDLE_DIR/summary.txt" echo "CLI Version: $(databricks --version)" >> "$BUNDLE_DIR/summary.txt" echo "Python: $(python --version 2>&1)" >> "$BUNDLE_DIR/summary.txt" echo "Databricks SDK: $(pip show databricks-sdk 2>/dev/null | grep Version)" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_HOST: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_TOKEN: ${DATABRICKS_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

Workspace info

echo "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt" databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info" echo "" >> "$BUNDLE_DIR/summary.txt"

undefined

echo "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt" databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info" echo "" >> "$BUNDLE_DIR/summary.txt"

undefined

Step 3: Collect Cluster Information

步骤3：收集集群信息

bash

undefined

bash

undefined

Cluster details (if cluster_id provided)

CLUSTER_ID="${1:-}" if [ -n "$CLUSTER_ID" ]; then echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1

# Extract key info
jq -r '{
    state: .state,
    spark_version: .spark_version,
    node_type_id: .node_type_id,
    num_workers: .num_workers,
    autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null

fi

undefined

CLUSTER_ID="${1:-}" if [ -n "$CLUSTER_ID" ]; then echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1

# Extract key info
jq -r '{
    state: .state,
    spark_version: .spark_version,
    node_type_id: .node_type_id,
    num_workers: .num_workers,
    autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null

fi

undefined

Step 4: Collect Job Run Information

步骤4：收集作业运行信息

bash

undefined

bash

undefined

Job run details (if run_id provided)

RUN_ID="${2:-}" if [ -n "$RUN_ID" ]; then echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1

# Extract run state
jq -r '{
    state: .state.life_cycle_state,
    result: .state.result_state,
    message: .state.state_message,
    start_time: .start_time,
    end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"

# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null

fi

undefined

RUN_ID="${2:-}" if [ -n "$RUN_ID" ]; then echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1

# Extract run state
jq -r '{
    state: .state.life_cycle_state,
    result: .state.result_state,
    message: .state.state_message,
    start_time: .start_time,
    end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"

# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null

fi

undefined

Step 5: Collect Spark Logs

步骤5：收集Spark日志

bash

undefined

bash

undefined

Spark driver logs (requires cluster_id)

if [ -n "$CLUSTER_ID" ]; then echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"

# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1

from databricks.sdk import WorkspaceClient w = WorkspaceClient() try: logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID") print(logs.log_content[:50000] if logs.log_content else "No logs available") except Exception as e: print(f"Error fetching logs: {e}") EOF fi

undefined

if [ -n "$CLUSTER_ID" ]; then echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"

# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1

from databricks.sdk import WorkspaceClient w = WorkspaceClient() try: logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID") print(logs.log_content[:50000] if logs.log_content else "No logs available") except Exception as e: print(f"Error fetching logs: {e}") EOF fi

undefined

Step 6: Collect Delta Table Info

步骤6：收集Delta表信息

bash

undefined

bash

undefined

Delta table diagnostics (if table provided)

TABLE_NAME="${3:-}" if [ -n "$TABLE_NAME" ]; then echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"

python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1

from databricks.sdk import WorkspaceClient from databricks.connect import DatabricksSession

w = WorkspaceClient() spark = DatabricksSession.builder.getOrCreate()

TABLE_NAME="${3:-}" if [ -n "$TABLE_NAME" ]; then echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"

python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1

from databricks.sdk import WorkspaceClient from databricks.connect import DatabricksSession

w = WorkspaceClient() spark = DatabricksSession.builder.getOrCreate()

Table history

print("=== Table History ===") history_df = spark.sql(f"DESCRIBE HISTORY {TABLE_NAME} LIMIT 20") history_df.show(truncate=False)

Table details

print("\n=== Table Details ===") spark.sql(f"DESCRIBE DETAIL {TABLE_NAME}").show(truncate=False)

Schema

print("\n=== Schema ===") spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False) EOF fi

undefined

print("\n=== Schema ===") spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False) EOF fi

undefined

Step 7: Package Bundle

步骤7：打包调试包

bash

undefined

bash

undefined

Create config snapshot (redacted)

echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt" cat ~/.databrickscfg 2>/dev/null | sed 's/token = .*/token = REDACTED/' >> "$BUNDLE_DIR/config-redacted.txt"

Network connectivity test

echo "--- Network Test ---" >> "$BUNDLE_DIR/summary.txt" echo -n "API Health: " >> "$BUNDLE_DIR/summary.txt" curl -s -o /dev/null -w "%{http_code}" "${DATABRICKS_HOST}/api/2.0/clusters/list"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

Package everything

tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR" rm -rf "$BUNDLE_DIR"

echo "Bundle created: $BUNDLE_DIR.tar.gz" echo "" echo "Contents:" echo " - summary.txt: Environment and error summary" echo " - cluster_info.json: Cluster configuration" echo " - cluster_events.json: Recent cluster events" echo " - run_info.json: Job run details" echo " - run_output.json: Task outputs and errors" echo " - driver_logs.txt: Spark driver logs" echo " - delta_info.txt: Delta table diagnostics" echo " - config-redacted.txt: CLI configuration (secrets removed)"

undefined

tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR" rm -rf "$BUNDLE_DIR"

echo "Bundle created: $BUNDLE_DIR.tar.gz" echo "" echo "Contents:" echo " - summary.txt: Environment and error summary" echo " - cluster_info.json: Cluster configuration" echo " - cluster_events.json: Recent cluster events" echo " - run_info.json: Job run details" echo " - run_output.json: Task outputs and errors" echo " - driver_logs.txt: Spark driver logs" echo " - delta_info.txt: Delta table diagnostics" echo " - config-redacted.txt: CLI configuration (secrets removed)"

undefined

Output

输出

```
databricks-debug-YYYYMMDD-HHMMSS.tar.gz
```
archive containing:
- ```
summary.txt
```
  - Environment and error summary
- ```
cluster_info.json
```
  - Cluster configuration
- ```
cluster_events.json
```
  - Recent cluster events
- ```
run_info.json
```
  - Job run details
- ```
driver_logs.txt
```
  - Spark driver logs
- ```
config-redacted.txt
```
  - Configuration (secrets removed)

```
databricks-debug-YYYYMMDD-HHMMSS.tar.gz
```
压缩包，包含：
- ```
summary.txt
```
  - 环境与错误汇总
- ```
cluster_info.json
```
  - 集群配置信息
- ```
cluster_events.json
```
  - 近期集群事件
- ```
run_info.json
```
  - 作业运行详情
- ```
driver_logs.txt
```
  - Spark驱动日志
- ```
config-redacted.txt
```
  - 配置信息（已移除敏感内容）

Error Handling

错误处理

Item	Purpose	Included
Environment versions	Compatibility check	Yes
Cluster config	Hardware/software setup	Yes
Cluster events	State changes, errors	Yes
Job run details	Task failures, timing	Yes
Spark logs	Stack traces, exceptions	Yes
Delta table info	Schema, history	Optional

项	用途	是否包含
环境版本	兼容性检查	是
集群配置	硬件/软件设置	是
集群事件	状态变更、错误信息	是
作业运行详情	任务失败、耗时统计	是
Spark日志	堆栈跟踪、异常信息	是
Delta表信息	表结构、历史记录	可选

Examples

示例

Sensitive Data Handling

敏感数据处理

ALWAYS REDACT:

API tokens and secrets
Personal access tokens
Connection strings
PII in logs

Safe to Include:

Error messages
Stack traces (check for PII)
Cluster IDs, job IDs
Configuration (without secrets)

务必脱敏：

API令牌与密钥
个人访问令牌
连接字符串
日志中的个人身份信息（PII）

可安全包含：

错误消息
堆栈跟踪（需检查是否包含PII）
集群ID、作业ID
配置信息（不含密钥）

Usage

使用方法

bash

undefined

bash

undefined

Basic bundle (environment only)

基础调试包（仅环境信息）

./databricks-debug-bundle.sh

With cluster diagnostics

包含集群诊断信息

./databricks-debug-bundle.sh cluster-12345-abcde

With job run diagnostics

包含作业运行诊断信息

./databricks-debug-bundle.sh cluster-12345-abcde 67890

Full diagnostics with Delta table

完整诊断（含Delta表信息）

./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table

undefined

./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table

undefined

Submit to Support

提交至支持团队

Create bundle:

bash databricks-debug-bundle.sh [cluster-id] [run-id]

Review for sensitive data
Open support ticket at Databricks Support
Attach bundle to ticket

创建调试包：

bash databricks-debug-bundle.sh [cluster-id] [run-id]

检查包中是否包含敏感数据
前往Databricks支持页面提交支持工单
将调试包附加到工单中

Resources

Next Steps

后续步骤

For rate limit issues, see

databricks-rate-limits

.

若遇到速率限制问题，请查看

databricks-rate-limits

。

databricks-debug-bundle

Original

Translation

Databricks Debug Bundle

Databricks调试包

Overview

概述

Prerequisites

前提条件

Instructions

操作步骤

Step 1: Create Debug Bundle Script

步骤1：创建调试包脚本

databricks-debug-bundle.sh

databricks-debug-bundle.sh

Step 2: Collect Environment Info

步骤2：收集环境信息

Environment info

Environment info

Workspace info

Workspace info

Step 3: Collect Cluster Information

步骤3：收集集群信息

Cluster details (if cluster_id provided)

Cluster details (if cluster_id provided)

Step 4: Collect Job Run Information

步骤4：收集作业运行信息

Job run details (if run_id provided)

Job run details (if run_id provided)

Step 5: Collect Spark Logs

步骤5：收集Spark日志

Spark driver logs (requires cluster_id)

Spark driver logs (requires cluster_id)

Step 6: Collect Delta Table Info

步骤6：收集Delta表信息

Delta table diagnostics (if table provided)

Delta table diagnostics (if table provided)

Table history

Table history

Table details

Table details

Schema

Schema

Step 7: Package Bundle

步骤7：打包调试包

Create config snapshot (redacted)

Create config snapshot (redacted)

Network connectivity test

Network connectivity test

Package everything

Package everything

Output

输出

Error Handling

错误处理

Examples

示例

Sensitive Data Handling

敏感数据处理

Usage

使用方法

Basic bundle (environment only)

基础调试包（仅环境信息）

With cluster diagnostics

包含集群诊断信息

With job run diagnostics

包含作业运行诊断信息

Full diagnostics with Delta table

完整诊断（含Delta表信息）

Submit to Support

提交至支持团队

Resources

相关资源

Next Steps

后续步骤