databricks-debug-bundle

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Databricks Debug Bundle

Databricks调试包

Overview

概述

Collect all necessary diagnostic information for Databricks support tickets.
为Databricks支持工单收集所有必要的诊断信息。

Prerequisites

前提条件

  • Databricks CLI installed and configured
  • Access to cluster logs (admin or cluster owner)
  • Permission to access job run details
  • 已安装并配置Databricks CLI
  • 有权限访问集群日志(管理员或集群所有者)
  • 有权限访问作业运行详情

Instructions

操作步骤

Step 1: Create Debug Bundle Script

步骤1:创建调试包脚本

bash
#!/bin/bash
bash
#!/bin/bash

databricks-debug-bundle.sh

databricks-debug-bundle.sh

set -e BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)" mkdir -p "$BUNDLE_DIR"
echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt" echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt" echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
undefined
set -e BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)" mkdir -p "$BUNDLE_DIR"
echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt" echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt" echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
undefined

Step 2: Collect Environment Info

步骤2:收集环境信息

bash
undefined
bash
undefined

Environment info

Environment info

echo "--- Environment ---" >> "$BUNDLE_DIR/summary.txt" echo "CLI Version: $(databricks --version)" >> "$BUNDLE_DIR/summary.txt" echo "Python: $(python --version 2>&1)" >> "$BUNDLE_DIR/summary.txt" echo "Databricks SDK: $(pip show databricks-sdk 2>/dev/null | grep Version)" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_HOST: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_TOKEN: ${DATABRICKS_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Environment ---" >> "$BUNDLE_DIR/summary.txt" echo "CLI Version: $(databricks --version)" >> "$BUNDLE_DIR/summary.txt" echo "Python: $(python --version 2>&1)" >> "$BUNDLE_DIR/summary.txt" echo "Databricks SDK: $(pip show databricks-sdk 2>/dev/null | grep Version)" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_HOST: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt" echo "DATABRICKS_TOKEN: ${DATABRICKS_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

Workspace info

Workspace info

echo "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt" databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info" echo "" >> "$BUNDLE_DIR/summary.txt"
undefined
echo "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt" databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info" echo "" >> "$BUNDLE_DIR/summary.txt"
undefined

Step 3: Collect Cluster Information

步骤3:收集集群信息

bash
undefined
bash
undefined

Cluster details (if cluster_id provided)

Cluster details (if cluster_id provided)

CLUSTER_ID="${1:-}" if [ -n "$CLUSTER_ID" ]; then echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1
# Extract key info
jq -r '{
    state: .state,
    spark_version: .spark_version,
    node_type_id: .node_type_id,
    num_workers: .num_workers,
    autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
undefined
CLUSTER_ID="${1:-}" if [ -n "$CLUSTER_ID" ]; then echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1
# Extract key info
jq -r '{
    state: .state,
    spark_version: .spark_version,
    node_type_id: .node_type_id,
    num_workers: .num_workers,
    autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
undefined

Step 4: Collect Job Run Information

步骤4:收集作业运行信息

bash
undefined
bash
undefined

Job run details (if run_id provided)

Job run details (if run_id provided)

RUN_ID="${2:-}" if [ -n "$RUN_ID" ]; then echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1
# Extract run state
jq -r '{
    state: .state.life_cycle_state,
    result: .state.result_state,
    message: .state.state_message,
    start_time: .start_time,
    end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"

# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
undefined
RUN_ID="${2:-}" if [ -n "$RUN_ID" ]; then echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt" databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1
# Extract run state
jq -r '{
    state: .state.life_cycle_state,
    result: .state.result_state,
    message: .state.state_message,
    start_time: .start_time,
    end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"

# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"

# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/null
fi
undefined

Step 5: Collect Spark Logs

步骤5:收集Spark日志

bash
undefined
bash
undefined

Spark driver logs (requires cluster_id)

Spark driver logs (requires cluster_id)

if [ -n "$CLUSTER_ID" ]; then echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"
# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1
from databricks.sdk import WorkspaceClient w = WorkspaceClient() try: logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID") print(logs.log_content[:50000] if logs.log_content else "No logs available") except Exception as e: print(f"Error fetching logs: {e}") EOF fi
undefined
if [ -n "$CLUSTER_ID" ]; then echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"
# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1
from databricks.sdk import WorkspaceClient w = WorkspaceClient() try: logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID") print(logs.log_content[:50000] if logs.log_content else "No logs available") except Exception as e: print(f"Error fetching logs: {e}") EOF fi
undefined

Step 6: Collect Delta Table Info

步骤6:收集Delta表信息

bash
undefined
bash
undefined

Delta table diagnostics (if table provided)

Delta table diagnostics (if table provided)

TABLE_NAME="${3:-}" if [ -n "$TABLE_NAME" ]; then echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"
python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1
from databricks.sdk import WorkspaceClient from databricks.connect import DatabricksSession
w = WorkspaceClient() spark = DatabricksSession.builder.getOrCreate()
TABLE_NAME="${3:-}" if [ -n "$TABLE_NAME" ]; then echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"
python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1
from databricks.sdk import WorkspaceClient from databricks.connect import DatabricksSession
w = WorkspaceClient() spark = DatabricksSession.builder.getOrCreate()

Table history

Table history

print("=== Table History ===") history_df = spark.sql(f"DESCRIBE HISTORY {TABLE_NAME} LIMIT 20") history_df.show(truncate=False)
print("=== Table History ===") history_df = spark.sql(f"DESCRIBE HISTORY {TABLE_NAME} LIMIT 20") history_df.show(truncate=False)

Table details

Table details

print("\n=== Table Details ===") spark.sql(f"DESCRIBE DETAIL {TABLE_NAME}").show(truncate=False)
print("\n=== Table Details ===") spark.sql(f"DESCRIBE DETAIL {TABLE_NAME}").show(truncate=False)

Schema

Schema

print("\n=== Schema ===") spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False) EOF fi
undefined
print("\n=== Schema ===") spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False) EOF fi
undefined

Step 7: Package Bundle

步骤7:打包调试包

bash
undefined
bash
undefined

Create config snapshot (redacted)

Create config snapshot (redacted)

echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt" cat ~/.databrickscfg 2>/dev/null | sed 's/token = .*/token = REDACTED/' >> "$BUNDLE_DIR/config-redacted.txt"
echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt" cat ~/.databrickscfg 2>/dev/null | sed 's/token = .*/token = REDACTED/' >> "$BUNDLE_DIR/config-redacted.txt"

Network connectivity test

Network connectivity test

echo "--- Network Test ---" >> "$BUNDLE_DIR/summary.txt" echo -n "API Health: " >> "$BUNDLE_DIR/summary.txt" curl -s -o /dev/null -w "%{http_code}" "${DATABRICKS_HOST}/api/2.0/clusters/list"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Network Test ---" >> "$BUNDLE_DIR/summary.txt" echo -n "API Health: " >> "$BUNDLE_DIR/summary.txt" curl -s -o /dev/null -w "%{http_code}" "${DATABRICKS_HOST}/api/2.0/clusters/list"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"

Package everything

Package everything

tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR" rm -rf "$BUNDLE_DIR"
echo "Bundle created: $BUNDLE_DIR.tar.gz" echo "" echo "Contents:" echo " - summary.txt: Environment and error summary" echo " - cluster_info.json: Cluster configuration" echo " - cluster_events.json: Recent cluster events" echo " - run_info.json: Job run details" echo " - run_output.json: Task outputs and errors" echo " - driver_logs.txt: Spark driver logs" echo " - delta_info.txt: Delta table diagnostics" echo " - config-redacted.txt: CLI configuration (secrets removed)"
undefined
tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR" rm -rf "$BUNDLE_DIR"
echo "Bundle created: $BUNDLE_DIR.tar.gz" echo "" echo "Contents:" echo " - summary.txt: Environment and error summary" echo " - cluster_info.json: Cluster configuration" echo " - cluster_events.json: Recent cluster events" echo " - run_info.json: Job run details" echo " - run_output.json: Task outputs and errors" echo " - driver_logs.txt: Spark driver logs" echo " - delta_info.txt: Delta table diagnostics" echo " - config-redacted.txt: CLI configuration (secrets removed)"
undefined

Output

输出

  • databricks-debug-YYYYMMDD-HHMMSS.tar.gz
    archive containing:
    • summary.txt
      - Environment and error summary
    • cluster_info.json
      - Cluster configuration
    • cluster_events.json
      - Recent cluster events
    • run_info.json
      - Job run details
    • driver_logs.txt
      - Spark driver logs
    • config-redacted.txt
      - Configuration (secrets removed)
  • databricks-debug-YYYYMMDD-HHMMSS.tar.gz
    压缩包,包含:
    • summary.txt
      - 环境与错误汇总
    • cluster_info.json
      - 集群配置信息
    • cluster_events.json
      - 近期集群事件
    • run_info.json
      - 作业运行详情
    • driver_logs.txt
      - Spark驱动日志
    • config-redacted.txt
      - 配置信息(已移除敏感内容)

Error Handling

错误处理

ItemPurposeIncluded
Environment versionsCompatibility checkYes
Cluster configHardware/software setupYes
Cluster eventsState changes, errorsYes
Job run detailsTask failures, timingYes
Spark logsStack traces, exceptionsYes
Delta table infoSchema, historyOptional
用途是否包含
环境版本兼容性检查
集群配置硬件/软件设置
集群事件状态变更、错误信息
作业运行详情任务失败、耗时统计
Spark日志堆栈跟踪、异常信息
Delta表信息表结构、历史记录可选

Examples

示例

Sensitive Data Handling

敏感数据处理

ALWAYS REDACT:
  • API tokens and secrets
  • Personal access tokens
  • Connection strings
  • PII in logs
Safe to Include:
  • Error messages
  • Stack traces (check for PII)
  • Cluster IDs, job IDs
  • Configuration (without secrets)
务必脱敏:
  • API令牌与密钥
  • 个人访问令牌
  • 连接字符串
  • 日志中的个人身份信息(PII)
可安全包含:
  • 错误消息
  • 堆栈跟踪(需检查是否包含PII)
  • 集群ID、作业ID
  • 配置信息(不含密钥)

Usage

使用方法

bash
undefined
bash
undefined

Basic bundle (environment only)

基础调试包(仅环境信息)

./databricks-debug-bundle.sh
./databricks-debug-bundle.sh

With cluster diagnostics

包含集群诊断信息

./databricks-debug-bundle.sh cluster-12345-abcde
./databricks-debug-bundle.sh cluster-12345-abcde

With job run diagnostics

包含作业运行诊断信息

./databricks-debug-bundle.sh cluster-12345-abcde 67890
./databricks-debug-bundle.sh cluster-12345-abcde 67890

Full diagnostics with Delta table

完整诊断(含Delta表信息)

./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table
undefined
./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table
undefined

Submit to Support

提交至支持团队

  1. Create bundle:
    bash databricks-debug-bundle.sh [cluster-id] [run-id]
  2. Review for sensitive data
  3. Open support ticket at Databricks Support
  4. Attach bundle to ticket
  1. 创建调试包:
    bash databricks-debug-bundle.sh [cluster-id] [run-id]
  2. 检查包中是否包含敏感数据
  3. 前往Databricks支持页面提交支持工单
  4. 将调试包附加到工单中

Resources

相关资源

Next Steps

后续步骤

For rate limit issues, see
databricks-rate-limits
.
若遇到速率限制问题,请查看
databricks-rate-limits