databricks-debug-bundle
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabricks Debug Bundle
Databricks调试包
Overview
概述
Collect all necessary diagnostic information for Databricks support tickets.
为Databricks支持工单收集所有必要的诊断信息。
Prerequisites
前提条件
- Databricks CLI installed and configured
- Access to cluster logs (admin or cluster owner)
- Permission to access job run details
- 已安装并配置Databricks CLI
- 有权限访问集群日志(管理员或集群所有者)
- 有权限访问作业运行详情
Instructions
操作步骤
Step 1: Create Debug Bundle Script
步骤1:创建调试包脚本
bash
#!/bin/bashbash
#!/bin/bashdatabricks-debug-bundle.sh
databricks-debug-bundle.sh
set -e
BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt"
echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
undefinedset -e
BUNDLE_DIR="databricks-debug-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$BUNDLE_DIR"
echo "=== Databricks Debug Bundle ===" > "$BUNDLE_DIR/summary.txt"
echo "Generated: $(date)" >> "$BUNDLE_DIR/summary.txt"
echo "Workspace: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
undefinedStep 2: Collect Environment Info
步骤2:收集环境信息
bash
undefinedbash
undefinedEnvironment info
Environment info
echo "--- Environment ---" >> "$BUNDLE_DIR/summary.txt"
echo "CLI Version: $(databricks --version)" >> "$BUNDLE_DIR/summary.txt"
echo "Python: $(python --version 2>&1)" >> "$BUNDLE_DIR/summary.txt"
echo "Databricks SDK: $(pip show databricks-sdk 2>/dev/null | grep Version)" >> "$BUNDLE_DIR/summary.txt"
echo "DATABRICKS_HOST: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt"
echo "DATABRICKS_TOKEN: ${DATABRICKS_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Environment ---" >> "$BUNDLE_DIR/summary.txt"
echo "CLI Version: $(databricks --version)" >> "$BUNDLE_DIR/summary.txt"
echo "Python: $(python --version 2>&1)" >> "$BUNDLE_DIR/summary.txt"
echo "Databricks SDK: $(pip show databricks-sdk 2>/dev/null | grep Version)" >> "$BUNDLE_DIR/summary.txt"
echo "DATABRICKS_HOST: ${DATABRICKS_HOST}" >> "$BUNDLE_DIR/summary.txt"
echo "DATABRICKS_TOKEN: ${DATABRICKS_TOKEN:+[SET]}" >> "$BUNDLE_DIR/summary.txt"
echo "" >> "$BUNDLE_DIR/summary.txt"
Workspace info
Workspace info
echo "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt"
databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info"
echo "" >> "$BUNDLE_DIR/summary.txt"
undefinedecho "--- Workspace Info ---" >> "$BUNDLE_DIR/summary.txt"
databricks current-user me 2>&1 >> "$BUNDLE_DIR/summary.txt" || echo "Failed to get user info"
echo "" >> "$BUNDLE_DIR/summary.txt"
undefinedStep 3: Collect Cluster Information
步骤3:收集集群信息
bash
undefinedbash
undefinedCluster details (if cluster_id provided)
Cluster details (if cluster_id provided)
CLUSTER_ID="${1:-}"
if [ -n "$CLUSTER_ID" ]; then
echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1
# Extract key info
jq -r '{
state: .state,
spark_version: .spark_version,
node_type_id: .node_type_id,
num_workers: .num_workers,
autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"
# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/nullfi
undefinedCLUSTER_ID="${1:-}"
if [ -n "$CLUSTER_ID" ]; then
echo "--- Cluster Info: $CLUSTER_ID ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters get --cluster-id "$CLUSTER_ID" > "$BUNDLE_DIR/cluster_info.json" 2>&1
# Extract key info
jq -r '{
state: .state,
spark_version: .spark_version,
node_type_id: .node_type_id,
num_workers: .num_workers,
autotermination_minutes: .autotermination_minutes
}' "$BUNDLE_DIR/cluster_info.json" >> "$BUNDLE_DIR/summary.txt"
# Get cluster events
echo "--- Recent Cluster Events ---" >> "$BUNDLE_DIR/summary.txt"
databricks clusters events --cluster-id "$CLUSTER_ID" --limit 20 > "$BUNDLE_DIR/cluster_events.json" 2>&1
jq -r '.events[] | "\(.timestamp): \(.type) - \(.details)"' "$BUNDLE_DIR/cluster_events.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/nullfi
undefinedStep 4: Collect Job Run Information
步骤4:收集作业运行信息
bash
undefinedbash
undefinedJob run details (if run_id provided)
Job run details (if run_id provided)
RUN_ID="${2:-}"
if [ -n "$RUN_ID" ]; then
echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1
# Extract run state
jq -r '{
state: .state.life_cycle_state,
result: .state.result_state,
message: .state.state_message,
start_time: .start_time,
end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"
# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"
# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/nullfi
undefinedRUN_ID="${2:-}"
if [ -n "$RUN_ID" ]; then
echo "--- Job Run Info: $RUN_ID ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get --run-id "$RUN_ID" > "$BUNDLE_DIR/run_info.json" 2>&1
# Extract run state
jq -r '{
state: .state.life_cycle_state,
result: .state.result_state,
message: .state.state_message,
start_time: .start_time,
end_time: .end_time
}' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt"
# Get run output
echo "--- Run Output ---" >> "$BUNDLE_DIR/summary.txt"
databricks runs get-output --run-id "$RUN_ID" > "$BUNDLE_DIR/run_output.json" 2>&1
jq -r '.error // "No error"' "$BUNDLE_DIR/run_output.json" >> "$BUNDLE_DIR/summary.txt"
# Task-level details
jq -r '.tasks[] | "Task \(.task_key): \(.state.result_state)"' "$BUNDLE_DIR/run_info.json" >> "$BUNDLE_DIR/summary.txt" 2>/dev/nullfi
undefinedStep 5: Collect Spark Logs
步骤5:收集Spark日志
bash
undefinedbash
undefinedSpark driver logs (requires cluster_id)
Spark driver logs (requires cluster_id)
if [ -n "$CLUSTER_ID" ]; then
echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"
# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
try:
logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID")
print(logs.log_content[:50000] if logs.log_content else "No logs available")
except Exception as e:
print(f"Error fetching logs: {e}")
EOF
fi
undefinedif [ -n "$CLUSTER_ID" ]; then
echo "--- Spark Driver Logs (last 500 lines) ---" > "$BUNDLE_DIR/driver_logs.txt"
# Get logs via API
python3 << EOF >> "$BUNDLE_DIR/driver_logs.txt" 2>&1from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
try:
logs = w.clusters.get_cluster_driver_logs(cluster_id="$CLUSTER_ID")
print(logs.log_content[:50000] if logs.log_content else "No logs available")
except Exception as e:
print(f"Error fetching logs: {e}")
EOF
fi
undefinedStep 6: Collect Delta Table Info
步骤6:收集Delta表信息
bash
undefinedbash
undefinedDelta table diagnostics (if table provided)
Delta table diagnostics (if table provided)
TABLE_NAME="${3:-}"
if [ -n "$TABLE_NAME" ]; then
echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"
python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1from databricks.sdk import WorkspaceClient
from databricks.connect import DatabricksSession
w = WorkspaceClient()
spark = DatabricksSession.builder.getOrCreate()
TABLE_NAME="${3:-}"
if [ -n "$TABLE_NAME" ]; then
echo "--- Delta Table Info: $TABLE_NAME ---" >> "$BUNDLE_DIR/summary.txt"
python3 << EOF >> "$BUNDLE_DIR/delta_info.txt" 2>&1from databricks.sdk import WorkspaceClient
from databricks.connect import DatabricksSession
w = WorkspaceClient()
spark = DatabricksSession.builder.getOrCreate()
Table history
Table history
print("=== Table History ===")
history_df = spark.sql(f"DESCRIBE HISTORY {TABLE_NAME} LIMIT 20")
history_df.show(truncate=False)
print("=== Table History ===")
history_df = spark.sql(f"DESCRIBE HISTORY {TABLE_NAME} LIMIT 20")
history_df.show(truncate=False)
Table details
Table details
print("\n=== Table Details ===")
spark.sql(f"DESCRIBE DETAIL {TABLE_NAME}").show(truncate=False)
print("\n=== Table Details ===")
spark.sql(f"DESCRIBE DETAIL {TABLE_NAME}").show(truncate=False)
Schema
Schema
print("\n=== Schema ===")
spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False)
EOF
fi
undefinedprint("\n=== Schema ===")
spark.sql(f"DESCRIBE {TABLE_NAME}").show(truncate=False)
EOF
fi
undefinedStep 7: Package Bundle
步骤7:打包调试包
bash
undefinedbash
undefinedCreate config snapshot (redacted)
Create config snapshot (redacted)
echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt"
cat ~/.databrickscfg 2>/dev/null | sed 's/token = .*/token = REDACTED/' >> "$BUNDLE_DIR/config-redacted.txt"
echo "--- Config (redacted) ---" >> "$BUNDLE_DIR/summary.txt"
cat ~/.databrickscfg 2>/dev/null | sed 's/token = .*/token = REDACTED/' >> "$BUNDLE_DIR/config-redacted.txt"
Network connectivity test
Network connectivity test
echo "--- Network Test ---" >> "$BUNDLE_DIR/summary.txt"
echo -n "API Health: " >> "$BUNDLE_DIR/summary.txt"
curl -s -o /dev/null -w "%{http_code}" "${DATABRICKS_HOST}/api/2.0/clusters/list"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
echo "--- Network Test ---" >> "$BUNDLE_DIR/summary.txt"
echo -n "API Health: " >> "$BUNDLE_DIR/summary.txt"
curl -s -o /dev/null -w "%{http_code}" "${DATABRICKS_HOST}/api/2.0/clusters/list"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
-H "Authorization: Bearer ${DATABRICKS_TOKEN}" >> "$BUNDLE_DIR/summary.txt" echo "" >> "$BUNDLE_DIR/summary.txt"
Package everything
Package everything
tar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
rm -rf "$BUNDLE_DIR"
echo "Bundle created: $BUNDLE_DIR.tar.gz"
echo ""
echo "Contents:"
echo " - summary.txt: Environment and error summary"
echo " - cluster_info.json: Cluster configuration"
echo " - cluster_events.json: Recent cluster events"
echo " - run_info.json: Job run details"
echo " - run_output.json: Task outputs and errors"
echo " - driver_logs.txt: Spark driver logs"
echo " - delta_info.txt: Delta table diagnostics"
echo " - config-redacted.txt: CLI configuration (secrets removed)"
undefinedtar -czf "$BUNDLE_DIR.tar.gz" "$BUNDLE_DIR"
rm -rf "$BUNDLE_DIR"
echo "Bundle created: $BUNDLE_DIR.tar.gz"
echo ""
echo "Contents:"
echo " - summary.txt: Environment and error summary"
echo " - cluster_info.json: Cluster configuration"
echo " - cluster_events.json: Recent cluster events"
echo " - run_info.json: Job run details"
echo " - run_output.json: Task outputs and errors"
echo " - driver_logs.txt: Spark driver logs"
echo " - delta_info.txt: Delta table diagnostics"
echo " - config-redacted.txt: CLI configuration (secrets removed)"
undefinedOutput
输出
- archive containing:
databricks-debug-YYYYMMDD-HHMMSS.tar.gz- - Environment and error summary
summary.txt - - Cluster configuration
cluster_info.json - - Recent cluster events
cluster_events.json - - Job run details
run_info.json - - Spark driver logs
driver_logs.txt - - Configuration (secrets removed)
config-redacted.txt
- 压缩包,包含:
databricks-debug-YYYYMMDD-HHMMSS.tar.gz- - 环境与错误汇总
summary.txt - - 集群配置信息
cluster_info.json - - 近期集群事件
cluster_events.json - - 作业运行详情
run_info.json - - Spark驱动日志
driver_logs.txt - - 配置信息(已移除敏感内容)
config-redacted.txt
Error Handling
错误处理
| Item | Purpose | Included |
|---|---|---|
| Environment versions | Compatibility check | Yes |
| Cluster config | Hardware/software setup | Yes |
| Cluster events | State changes, errors | Yes |
| Job run details | Task failures, timing | Yes |
| Spark logs | Stack traces, exceptions | Yes |
| Delta table info | Schema, history | Optional |
| 项 | 用途 | 是否包含 |
|---|---|---|
| 环境版本 | 兼容性检查 | 是 |
| 集群配置 | 硬件/软件设置 | 是 |
| 集群事件 | 状态变更、错误信息 | 是 |
| 作业运行详情 | 任务失败、耗时统计 | 是 |
| Spark日志 | 堆栈跟踪、异常信息 | 是 |
| Delta表信息 | 表结构、历史记录 | 可选 |
Examples
示例
Sensitive Data Handling
敏感数据处理
ALWAYS REDACT:
- API tokens and secrets
- Personal access tokens
- Connection strings
- PII in logs
Safe to Include:
- Error messages
- Stack traces (check for PII)
- Cluster IDs, job IDs
- Configuration (without secrets)
务必脱敏:
- API令牌与密钥
- 个人访问令牌
- 连接字符串
- 日志中的个人身份信息(PII)
可安全包含:
- 错误消息
- 堆栈跟踪(需检查是否包含PII)
- 集群ID、作业ID
- 配置信息(不含密钥)
Usage
使用方法
bash
undefinedbash
undefinedBasic bundle (environment only)
基础调试包(仅环境信息)
./databricks-debug-bundle.sh
./databricks-debug-bundle.sh
With cluster diagnostics
包含集群诊断信息
./databricks-debug-bundle.sh cluster-12345-abcde
./databricks-debug-bundle.sh cluster-12345-abcde
With job run diagnostics
包含作业运行诊断信息
./databricks-debug-bundle.sh cluster-12345-abcde 67890
./databricks-debug-bundle.sh cluster-12345-abcde 67890
Full diagnostics with Delta table
完整诊断(含Delta表信息)
./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table
undefined./databricks-debug-bundle.sh cluster-12345 67890 catalog.schema.table
undefinedSubmit to Support
提交至支持团队
- Create bundle:
bash databricks-debug-bundle.sh [cluster-id] [run-id] - Review for sensitive data
- Open support ticket at Databricks Support
- Attach bundle to ticket
- 创建调试包:
bash databricks-debug-bundle.sh [cluster-id] [run-id] - 检查包中是否包含敏感数据
- 前往Databricks支持页面提交支持工单
- 将调试包附加到工单中
Resources
相关资源
Next Steps
后续步骤
For rate limit issues, see .
databricks-rate-limits若遇到速率限制问题,请查看。
databricks-rate-limits