spark-consumption-cli
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUpdate Check — ONCE PER SESSION (mandatory) The first time this skill is used in a session, run the check-updates skill before proceeding.
- GitHub Copilot CLI / VS Code: invoke the
skill.check-updates- Claude Code / Cowork / Cursor / Windsurf / Codex: compare local vs remote package.json version.
- Skip if the check was already performed earlier in this session.
CRITICAL NOTES
- To find the workspace details (including its ID) from workspace name: list all workspaces and, then, use JMESPath filtering
- To find the item details (including its ID) from workspace ID, item type, and item name: list all items of that type in that workspace and, then, use JMESPath filtering
更新检查——每个会话仅需执行一次(必填) 在会话中首次使用此技能时,请先运行check-updates技能再继续操作。
- GitHub Copilot CLI / VS Code:调用
技能。check-updates- Claude Code / Cowork / Cursor / Windsurf / Codex:对比本地与远程package.json版本。
- 若本会话中已完成过检查,可跳过此步骤。
重要说明
- 要通过工作区名称获取工作区详情(包括ID):列出所有工作区,然后使用JMESPath筛选
- 要通过工作区ID、项目类型和项目名称获取项目详情(包括ID):列出该工作区中对应类型的所有项目,然后使用JMESPath筛选
Data Engineering Consumption — CLI Skill
数据工程消费——CLI技能
Table of Contents
目录
| Task | Reference | Notes |
|---|---|---|
| Fabric Topology & Key Concepts | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| Environment URLs | COMMON-CORE.md § Environment URLs | |
| Authentication & Token Acquisition | COMMON-CORE.md § Authentication & Token Acquisition | Wrong audience = 401; read before any auth issue |
| Core Control-Plane REST APIs | COMMON-CORE.md § Core Control-Plane REST APIs | |
| Pagination | COMMON-CORE.md § Pagination | |
| Long-Running Operations (LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| Rate Limiting & Throttling | COMMON-CORE.md § Rate Limiting & Throttling | |
| OneLake Data Access | COMMON-CORE.md § OneLake Data Access | Requires |
| Job Execution | COMMON-CORE.md § Job Execution | |
| Capacity Management | COMMON-CORE.md § Capacity Management | |
| Gotchas & Troubleshooting | COMMON-CORE.md § Gotchas & Troubleshooting | |
| Best Practices | COMMON-CORE.md § Best Practices | |
| Tool Selection Rationale | COMMON-CLI.md § Tool Selection Rationale | |
| Finding Workspaces and Items in Fabric | COMMON-CLI.md § Finding Workspaces and Items in Fabric | Mandatory — READ link first [needed for finding workspace id by its name or item id by its name, item type, and workspace id] |
| Authentication Recipes | COMMON-CLI.md § Authentication Recipes | |
Fabric Control-Plane API via | COMMON-CLI.md § Fabric Control-Plane API via az rest | Always pass |
| Pagination Pattern | COMMON-CLI.md § Pagination Pattern | |
| Long-Running Operations (LRO) Pattern | COMMON-CLI.md § Long-Running Operations (LRO) Pattern | |
OneLake Data Access via | COMMON-CLI.md § OneLake Data Access via curl | Use |
| SQL / TDS Data-Plane Access | COMMON-CLI.md § SQL / TDS Data-Plane Access | |
| Job Execution (CLI) | COMMON-CLI.md § Job Execution | |
| OneLake Shortcuts | COMMON-CLI.md § OneLake Shortcuts | |
| Capacity Management (CLI) | COMMON-CLI.md § Capacity Management | |
| Composite Recipes | COMMON-CLI.md § Composite Recipes | |
| Gotchas & Troubleshooting (CLI-Specific) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | |
Quick Reference: | COMMON-CLI.md § Quick Reference: az rest Template | |
| Quick Reference: Token Audience / CLI Tool Matrix | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | Which |
| Relationship to SPARK-AUTHORING-CORE.md | SPARK-CONSUMPTION-CORE.md § Relationship to SPARK-AUTHORING-CORE.md | |
| Data Engineering Consumption Capability Matrix | SPARK-CONSUMPTION-CORE.md § Data Engineering Consumption Capability Matrix | |
| OneLake Table APIs (Schema-enabled Lakehouses) | SPARK-CONSUMPTION-CORE.md § OneLake Table APIs (Schema-enabled Lakehouses) | Unity Catalog-compatible metadata; requires |
| Livy Session Management | SPARK-CONSUMPTION-CORE.md § Livy Session Management | Session creation, states, lifecycle, termination |
| Interactive Data Exploration | SPARK-CONSUMPTION-CORE.md § Interactive Data Exploration | Statement execution, output retrieval, data discovery |
| PySpark Analytics Patterns | SPARK-CONSUMPTION-CORE.md § PySpark Analytics Patterns | Cross-lakehouse 3-part naming, performance optimization |
| Must/Prefer/Avoid | SKILL.md § Must/Prefer/Avoid | MUST DO / AVOID / PREFER checklists |
| Quick Start | SKILL.md § Quick Start | CLI-specific Livy session setup and data exploration |
| Key Fabric Patterns | SKILL.md § Key Fabric Patterns | Spark pattern quick-reference table |
| Session Cleanup | SKILL.md § Session Cleanup | Clean up idle Livy sessions via CLI |
| 任务 | 参考文档 | 说明 |
|---|---|---|
| Fabric拓扑与核心概念 | COMMON-CORE.md § Fabric Topology & Key Concepts | |
| 环境URL | COMMON-CORE.md § Environment URLs | |
| 认证与令牌获取 | COMMON-CORE.md § Authentication & Token Acquisition | 受众错误会导致401;遇到任何认证问题前请先阅读 |
| 核心控制平面REST API | COMMON-CORE.md § Core Control-Plane REST APIs | |
| 分页 | COMMON-CORE.md § Pagination | |
| 长时间运行操作(LRO) | COMMON-CORE.md § Long-Running Operations (LRO) | |
| 速率限制与限流 | COMMON-CORE.md § Rate Limiting & Throttling | |
| OneLake数据访问 | COMMON-CORE.md § OneLake Data Access | 需要 |
| 作业执行 | COMMON-CORE.md § Job Execution | |
| 容量管理 | COMMON-CORE.md § Capacity Management | |
| 常见问题与故障排除 | COMMON-CORE.md § Gotchas & Troubleshooting | |
| 最佳实践 | COMMON-CORE.md § Best Practices | |
| 工具选择理由 | COMMON-CLI.md § Tool Selection Rationale | |
| 在Fabric中查找工作区和项目 | COMMON-CLI.md § Finding Workspaces and Items in Fabric | 必填 — 先阅读链接内容 [需通过名称查找工作区ID,或通过名称、项目类型和工作区ID查找项目ID] |
| 认证方案 | COMMON-CLI.md § Authentication Recipes | |
通过 | COMMON-CLI.md § Fabric Control-Plane API via az rest | 必须传递 |
| 分页模式 | COMMON-CLI.md § Pagination Pattern | |
| 长时间运行操作(LRO)模式 | COMMON-CLI.md § Long-Running Operations (LRO) Pattern | |
通过 | COMMON-CLI.md § OneLake Data Access via curl | 使用 |
| SQL / TDS数据平面访问 | COMMON-CLI.md § SQL / TDS Data-Plane Access | |
| 作业执行(CLI) | COMMON-CLI.md § Job Execution | |
| OneLake快捷方式 | COMMON-CLI.md § OneLake Shortcuts | |
| 容量管理(CLI) | COMMON-CLI.md § Capacity Management | |
| 复合方案 | COMMON-CLI.md § Composite Recipes | |
| 常见问题与故障排除(CLI专属) | COMMON-CLI.md § Gotchas & Troubleshooting (CLI-Specific) | |
快速参考: | COMMON-CLI.md § Quick Reference: az rest Template | |
| 快速参考:令牌受众 / CLI工具矩阵 | COMMON-CLI.md § Quick Reference: Token Audience ↔ CLI Tool Matrix | 各服务对应的 |
| 与SPARK-AUTHORING-CORE.md的关系 | SPARK-CONSUMPTION-CORE.md § Relationship to SPARK-AUTHORING-CORE.md | |
| 数据工程消费能力矩阵 | SPARK-CONSUMPTION-CORE.md § Data Engineering Consumption Capability Matrix | |
| OneLake表API(支持Schema的湖仓) | SPARK-CONSUMPTION-CORE.md § OneLake Table APIs (Schema-enabled Lakehouses) | 兼容Unity Catalog的元数据;需要 |
| Livy会话管理 | SPARK-CONSUMPTION-CORE.md § Livy Session Management | 会话创建、状态、生命周期、终止 |
| 交互式数据探索 | SPARK-CONSUMPTION-CORE.md § Interactive Data Exploration | 语句执行、结果获取、数据发现 |
| PySpark分析模式 | SPARK-CONSUMPTION-CORE.md § PySpark Analytics Patterns | 跨湖仓三段式命名、性能优化 |
| 必须/推荐/避免 | SKILL.md § Must/Prefer/Avoid | 必须执行/避免/推荐清单 |
| 快速开始 | SKILL.md § Quick Start | CLI专属的Livy会话设置与数据探索 |
| Fabric核心模式 | SKILL.md § Key Fabric Patterns | Spark模式快速参考表 |
| 会话清理 | SKILL.md § Session Cleanup | 通过CLI清理闲置Livy会话 |
Must/Prefer/Avoid
必须/推荐/避免
MUST DO
必须执行
- Check for existing idle sessions before creating new ones
- Use dynamic workspace/lakehouse discovery
- Follow API patterns from COMMON-CLI.md
- 创建新会话前检查是否存在闲置会话
- 使用动态工作区/湖仓发现机制
- 遵循COMMON-CLI.md中的API模式
PREFER
推荐
- sqldw-consumption-cli for simple lakehouse queries — row counts, SELECT, schema exploration, filtering, and aggregation on lakehouse Delta tables should use the SQL Endpoint via , not Spark. Only use this skill when the user explicitly requests PySpark, DataFrames, or Spark-specific features.
sqlcmd - SQL Endpoint for Delta tables
- Livy for unstructured/JSON data or complex Python analytics
- Session reuse over creation
- 简单湖仓查询使用sqldw-consumption-cli — 湖仓Delta表的行数统计、SELECT查询、Schema探索、过滤和聚合操作应通过使用SQL端点,而非Spark。仅当用户明确要求使用PySpark、DataFrames或Spark专属功能时才使用本技能。
sqlcmd - 对Delta表使用SQL端点
- 对非结构化/JSON数据或复杂Python分析使用Livy
- 优先复用会话而非创建新会话
AVOID
避免
- Hardcoded workspace IDs
- Creating unnecessary sessions
- Large result sets without LIMIT
- 硬编码工作区ID
- 创建不必要的会话
- 未使用LIMIT返回大型结果集
Quick Start
快速开始
Environment Setup
环境设置
Apply environment detection from COMMON-CORE.md Environment Detection Pattern to set:
- and
$FABRIC_API_BASE$FABRIC_RESOURCE_SCOPE - and
$FABRIC_API_URLfor Livy operations$LIVY_API_PATH
Authentication: Use token acquisition from COMMON-CLI.md Environment Detection and API Configuration
应用COMMON-CORE.md中的环境检测模式来设置:
- 和
$FABRIC_API_BASE$FABRIC_RESOURCE_SCOPE - 和
$FABRIC_API_URL用于Livy操作$LIVY_API_PATH
认证:使用COMMON-CLI.md中的环境检测与API配置流程获取令牌
Workspace & Item Discovery
工作区与项目发现
Preferred: Use COMMON-CLI.md item discovery patterns (Finding things in Fabric) to find workspaces and items by name.
Fallback (when workspace is already known):
bash
undefined推荐方式:使用COMMON-CLI.md中的项目发现模式(在Fabric中查找资源)通过名称查找工作区和项目。
备选方案(已知工作区时):
bash
undefinedList workspaces
List workspaces
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces" --query "value[].{name:displayName, id:id}" --output table
read -p "Workspace ID: " workspaceId
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces" --query "value[].{name:displayName, id:id}" --output table
read -p "Workspace ID: " workspaceId
List lakehouses in workspace
List lakehouses in workspace
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/items?type=Lakehouse" --query "value[].{name:displayName, id:id}" --output table
read -p "Lakehouse ID: " lakehouseId
read -p "Lakehouse ID: " lakehouseId
undefinedaz rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/items?type=Lakehouse" --query "value[].{name:displayName, id:id}" --output table
read -p "Lakehouse ID: " lakehouseId
read -p "Lakehouse ID: " lakehouseId
undefinedSession Management
会话管理
bash
undefinedbash
undefinedCheck for existing idle session (avoid resource waste)
Check for existing idle session (avoid resource waste)
sessionId=$(az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'][0].id" --output tsv)
sessionId=$(az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'][0].id" --output tsv)
Create if none available - FORCE STARTER POOL USAGE
Create if none available - FORCE STARTER POOL USAGE
if [[ -z "$sessionId" ]]; then
cat > /tmp/body.json << 'EOF'
{
"name":"analysis",
"driverMemory":"56g",
"driverCores":8,
"executorMemory":"56g",
"executorCores":8,
"conf": {
"spark.dynamicAllocation.enabled": "true",
"spark.fabric.pool.name": "Starter Pool"
}
}
EOF
sessionId=$(az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --body @/tmp/body.json --query "id" --output tsv)
echo "⏳ Waiting for starter pool session to be ready..."
# With starter pools, this should be 3-5 seconds
timeout=30 # Reduced from 90s since starter pools are fast
while [ $timeout -gt 0 ]; do
state=$(az rest --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId" --query "state" --output tsv)
if [[ "$state" == "idle" ]]; then
echo "✅ Session ready in starter pool!"
break
fi
echo " Session state: $state (${timeout}s remaining)"
sleep 3
timeout=$((timeout - 3))
donefi
undefinedif [[ -z "$sessionId" ]]; then
cat > /tmp/body.json << 'EOF'
{
"name":"analysis",
"driverMemory":"56g",
"driverCores":8,
"executorMemory":"56g",
"executorCores":8,
"conf": {
"spark.dynamicAllocation.enabled": "true",
"spark.fabric.pool.name": "Starter Pool"
}
}
EOF
sessionId=$(az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --body @/tmp/body.json --query "id" --output tsv)
echo "⏳ Waiting for starter pool session to be ready..."
# With starter pools, this should be 3-5 seconds
timeout=30 # Reduced from 90s since starter pools are fast
while [ $timeout -gt 0 ]; do
state=$(az rest --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId" --query "state" --output tsv)
if [[ "$state" == "idle" ]]; then
echo "✅ Session ready in starter pool!"
break
fi
echo " Session state: $state (${timeout}s remaining)"
sleep 3
timeout=$((timeout - 3))
donefi
undefinedData Exploration (Fabric-Specific Patterns)
数据探索(Fabric专属模式)
bash
undefinedbash
undefinedExecute statement (LLM knows Python/Spark syntax)
Execute statement (LLM knows Python/Spark syntax)
cat > /tmp/body.json << 'EOF'
{
"code": "spark.sql("SHOW TABLES").show(); df = spark.table("your_table"); df.describe().show()",
"kind": "pyspark"
}
EOF
az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId/statements" --body @/tmp/body.json
undefinedcat > /tmp/body.json << 'EOF'
{
"code": "spark.sql("SHOW TABLES").show(); df = spark.table("your_table"); df.describe().show()",
"kind": "pyspark"
}
EOF
az rest --method post --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/$sessionId/statements" --body @/tmp/body.json
undefinedKey Fabric Patterns
Fabric核心模式
| Pattern | Code | Use Case |
|---|---|---|
| Table Discovery | | List available tables |
| Cross-Lakehouse | | Query across workspaces |
| Delta Features | | Time travel, versioning |
| Schema Evolution | | Understand structure |
| 模式 | 代码 | 使用场景 |
|---|---|---|
| 表发现 | | 列出可用表 |
| 跨湖仓查询 | | 跨工作区查询 |
| Delta特性 | | 时间旅行、版本管理 |
| Schema演进 | | 了解数据结构 |
Session Cleanup
会话清理
bash
undefinedbash
undefinedClean up idle sessions (optional)
Clean up idle sessions (optional)
az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'].id" --output tsv | xargs -I {} az rest --method delete --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/{}"
---
**Focus**: This skill provides Fabric-specific REST API patterns. LLM already knows Python/Spark syntax — we focus on Fabric integration, session management, and API endpoints.az rest --method get --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions" --query "sessions[?state=='idle'].id" --output tsv | xargs -I {} az rest --method delete --resource "$FABRIC_RESOURCE_SCOPE" --url "$FABRIC_API_URL/workspaces/$workspaceId/lakehouses/$lakehouseId/$LIVY_API_PATH/sessions/{}"
---
**重点**:本技能提供Fabric专属的REST API模式。LLM已掌握Python/Spark语法——我们聚焦于Fabric集成、会话管理和API端点。