skill-test
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabricks Skills Testing Framework
Databricks 技能测试框架
Offline YAML-first evaluation with human-in-the-loop review and interactive skill improvement.
采用YAML优先的离线评估模式,支持人工介入评审和交互式技能优化。
Quick References
快速参考
- Scorers - Available scorers and quality gates
- YAML Schemas - Manifest and ground truth formats
- Python API - Programmatic usage examples
- Workflows - Detailed example workflows
- Trace Evaluation - Session trace analysis
- 评分器 - 可用的评分器和质量门槛
- YAML 模式 - 清单和基准数据集格式
- Python API - 程序化使用示例
- 工作流 - 详细的示例工作流
- 追踪评估 - 会话追踪分析
/skill-test Command
/skill-test 命令
The command provides an interactive CLI for testing Databricks skills with real execution on Databricks.
/skill-test/skill-testBasic Usage
基本用法
/skill-test <skill-name> [subcommand]/skill-test <skill-name> [subcommand]Subcommands
子命令
| Subcommand | Description |
|---|---|
| Run evaluation against ground truth (default) |
| Compare current results against baseline |
| Initialize test scaffolding for a new skill |
| Interactive: prompt -> invoke skill -> test -> save |
| Add test case with trace evaluation |
| Review pending candidates interactively |
| Batch approve all pending candidates |
| Save current results as regression baseline |
| Run full MLflow evaluation with LLM judges |
| Evaluate traces against skill expectations |
| List available traces (MLflow or local) |
| List configured scorers for a skill |
| Add/remove scorers or update default guidelines |
| Sync YAML to Unity Catalog (Phase 2) |
| 子命令 | 描述 |
|---|---|
| 对照基准数据集运行评估(默认子命令) |
| 对比当前结果与基线版本 |
| 为新技能初始化测试脚手架 |
| 交互式流程:提示 -> 调用技能 -> 测试 -> 保存 |
| 添加带追踪评估的测试用例 |
| 交互式评审待处理候选用例 |
| 批量批准所有待处理候选用例 |
| 将当前结果保存为回归基线 |
| 结合LLM评审器运行完整的MLflow评估 |
| 对照技能预期评估追踪数据 |
| 列出可用的追踪数据(MLflow或本地) |
| 列出技能的已配置评分器 |
| 添加/移除评分器或更新默认准则 |
| 将YAML同步至Unity Catalog(第二阶段) |
Quick Examples
快速示例
/skill-test spark-declarative-pipelines run
/skill-test spark-declarative-pipelines add --trace
/skill-test spark-declarative-pipelines review --batch --filter-success
/skill-test my-new-skill initSee Workflows for detailed examples of each subcommand.
/skill-test spark-declarative-pipelines run
/skill-test spark-declarative-pipelines add --trace
/skill-test spark-declarative-pipelines review --batch --filter-success
/skill-test my-new-skill init查看工作流获取各子命令的详细示例。
Execution Instructions
执行说明
Environment Setup
环境设置
bash
uv pip install -e .test/Environment variables for Databricks MLflow:
- - Databricks CLI profile (default: "DEFAULT")
DATABRICKS_CONFIG_PROFILE - - Set to "databricks" for Databricks MLflow
MLFLOW_TRACKING_URI - - Experiment path (e.g., "/Users/{user}/skill-test")
MLFLOW_EXPERIMENT_NAME
bash
uv pip install -e .test/用于Databricks MLflow的环境变量:
- - Databricks CLI配置文件(默认值:"DEFAULT")
DATABRICKS_CONFIG_PROFILE - - 针对Databricks MLflow需设置为"databricks"
MLFLOW_TRACKING_URI - - 实验路径(例如:"/Users/{user}/skill-test")
MLFLOW_EXPERIMENT_NAME
Running Scripts
运行脚本
All subcommands have corresponding scripts in :
.test/scripts/bash
uv run python .test/scripts/{subcommand}.py {skill_name} [options]| Subcommand | Script |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
Use on any script for available options.
--help所有子命令在目录下都有对应的脚本:
.test/scripts/bash
uv run python .test/scripts/{subcommand}.py {skill_name} [options]| 子命令 | 脚本 |
|---|---|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
在任意脚本后添加查看可用选项。
--helpCommand Handler
命令处理器
When is invoked, parse arguments and execute the appropriate command.
/skill-test调用时,会解析参数并执行相应命令。
/skill-testArgument Parsing
参数解析
- = skill_name (required)
args[0] - = subcommand (optional, default: "run")
args[1]
- = skill_name(必填)
args[0] - = subcommand(可选,默认值:"run")
args[1]
Subcommand Routing
子命令路由
| Subcommand | Action |
|---|---|
| Execute |
| Execute |
| Execute |
| Prompt for test input, invoke skill, run |
| Execute |
| Execute |
| Execute |
| Execute |
| Execute |
| 子命令 | 操作 |
|---|---|
| 执行 |
| 执行 |
| 执行 |
| 提示输入测试内容、调用技能、执行 |
| 执行 |
| 执行 |
| 结合MLflow日志执行 |
| 执行 |
| 执行 |
init Behavior
init 行为
When running :
/skill-test <skill-name> init- Read the skill's SKILL.md to understand its purpose
- Create with appropriate scorers and trace_expectations
manifest.yaml - Create empty and
ground_truth.yamltemplatescandidates.yaml - Recommend test prompts based on documentation examples
Follow with using recommended prompts.
/skill-test <skill-name> add运行时:
/skill-test <skill-name> init- 读取技能的SKILL.md文件以了解其用途
- 创建包含合适评分器和追踪预期的
manifest.yaml - 创建空的和
ground_truth.yaml模板candidates.yaml - 根据文档示例推荐测试提示词
之后可使用推荐的提示词运行。
/skill-test <skill-name> addContext Setup
上下文设置
Create CLIContext with MCP tools before calling any command. See Python API for details.
调用任何命令前,需使用MCP工具创建CLIContext。详情请查看Python API。
File Locations
文件位置
Important: All test files are stored at the repository root level, not relative to this skill's directory.
| File Type | Path |
|---|---|
| Ground truth | |
| Candidates | |
| Manifest | |
| Routing tests | |
| Baselines | |
For example, to test in this repository:
spark-declarative-pipelines/Users/.../ai-dev-kit/.test/skills/spark-declarative-pipelines/ground_truth.yamlNot relative to the skill definition:
/Users/.../ai-dev-kit/.claude/skills/skill-test/skills/... # WRONG重要提示: 所有测试文件存储在仓库根目录级别,而非当前技能的目录下。
| 文件类型 | 路径 |
|---|---|
| 基准数据集 | |
| 候选用例 | |
| 清单 | |
| 路由测试 | |
| 基线版本 | |
例如,在本仓库中测试的路径为:
spark-declarative-pipelines/Users/.../ai-dev-kit/.test/skills/spark-declarative-pipelines/ground_truth.yaml请勿使用技能定义的相对路径:
/Users/.../ai-dev-kit/.claude/skills/skill-test/skills/... # 错误路径Directory Structure
目录结构
.test/ # At REPOSITORY ROOT (not skill directory)
├── pyproject.toml # Package config (pip install -e ".test/")
├── README.md # Contributor documentation
├── SKILL.md # Source of truth (synced to .claude/skills/)
├── install_skill_test.sh # Sync script
├── scripts/ # Wrapper scripts
│ ├── _common.py # Shared utilities
│ ├── run_eval.py
│ ├── regression.py
│ ├── init_skill.py
│ ├── add.py
│ ├── baseline.py
│ ├── mlflow_eval.py
│ ├── routing_eval.py
│ ├── trace_eval.py # Trace evaluation
│ ├── list_traces.py # List available traces
│ ├── scorers.py
│ ├── scorers_update.py
│ └── sync.py
├── src/
│ └── skill_test/ # Python package
│ ├── cli/ # CLI commands module
│ ├── fixtures/ # Test fixture setup
│ ├── scorers/ # Evaluation scorers
│ ├── grp/ # Generate-Review-Promote pipeline
│ └── runners/ # Evaluation runners
├── skills/ # Per-skill test definitions
│ ├── _routing/ # Routing test cases
│ └── {skill-name}/ # Skill-specific tests
│ ├── ground_truth.yaml
│ ├── candidates.yaml
│ └── manifest.yaml
├── tests/ # Unit tests
├── references/ # Documentation references
└── baselines/ # Regression baselines.test/ # 位于仓库根目录(而非技能目录)
├── pyproject.toml # 包配置(pip install -e ".test/")
├── README.md # 贡献者文档
├── SKILL.md # 权威来源(同步至.claude/skills/)
├── install_skill_test.sh # 同步脚本
├── scripts/ # 包装脚本
│ ├── _common.py # 共享工具
│ ├── run_eval.py
│ ├── regression.py
│ ├── init_skill.py
│ ├── add.py
│ ├── baseline.py
│ ├── mlflow_eval.py
│ ├── routing_eval.py
│ ├── trace_eval.py # 追踪评估
│ ├── list_traces.py # 列出可用追踪数据
│ ├── scorers.py
│ ├── scorers_update.py
│ └── sync.py
├── src/
│ └── skill_test/ # Python包
│ ├── cli/ # CLI命令模块
│ ├── fixtures/ # 测试夹具设置
│ ├── scorers/ # 评估评分器
│ ├── grp/ # Generate-Review-Promote pipeline
│ └── runners/ # 评估运行器
├── skills/ # 各技能的测试定义
│ ├── _routing/ # 路由测试用例
│ └── {skill-name}/ # 技能专属测试
│ ├── ground_truth.yaml
│ ├── candidates.yaml
│ └── manifest.yaml
├── tests/ # 单元测试
├── references/ # 文档参考
└── baselines/ # 回归基线References
参考文档
- Scorers - Available scorers and quality gates
- YAML Schemas - Manifest and ground truth formats
- Python API - Programmatic usage examples
- Workflows - Detailed example workflows
- Trace Evaluation - Session trace analysis
- 评分器 - 可用的评分器和质量门槛
- YAML 模式 - 清单和基准数据集格式
- Python API - 程序化使用示例
- 工作流 - 详细的示例工作流
- 追踪评估 - 会话追踪分析