addon-deterministic-eval-suite

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Add-on: Deterministic Eval Suite

插件：确定性评估套件

Use this skill when a project needs reproducible, merge-blocking evaluation checks.

当项目需要可复现的、会阻断合并的评估检查时使用此技能。

Compatibility

兼容性

Works with all
```
architect-*
```
scaffolds.
Recommended default for
```
production-default
```
mode.

兼容所有
```
architect-*
```
脚手架。
是
```
production-default
```
模式的推荐默认选项。

Inputs

输入参数

Collect:

EVAL_SCOPE

skill-only

project-only

both

(default

both

```
BLOCK_ON_FAIL
```
:
```
yes
```
|
```
no
```
(default
```
yes
```
).
```
RUN_DOCKER_CHECKS
```
:
```
yes
```
|
```
no
```
(default
```
yes
```
for production-default).

收集：

EVAL_SCOPE

：

skill-only

project-only

both

（默认值为

both

）。

```
BLOCK_ON_FAIL
```
：
```
yes
```
|
```
no
```
（默认值为
```
yes
```
）。
```
RUN_DOCKER_CHECKS
```
：
```
yes
```
|
```
no
```
（在production-default模式下默认值为
```
yes
```
）。

Integration Workflow

集成工作流

Add deterministic eval artifacts:

text

evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml

Baseline checks (always include):

file/contract existence checks
lint/type/test/build command checks
docker artifact checks (
```
Dockerfile
```
,
```
docker-compose.yml
```
, image build)

decision trace checks (

docs/DECISION_LOG.md

REVIEW_BUNDLE/DECISION_TRACE.md

)

non-zero exit on failure
for skills repositories: add repository-local checks that validate skill folder/frontmatter naming
for skills repositories: add repository-local checks that validate required decision-policy language

Skill-specific checks:

one check file per selected skill
examples:
```
check_nostr_profile.sh
```
```
check_rag_ingest_query.sh
```
```
check_review_bundle.sh
```
```
check_decision_trace.sh
```
```
check_skill_repo_policy.sh
```

Output summary:

write deterministic run summary to
```
REVIEW_BUNDLE/TEST_EVIDENCE.md
```
.

添加确定性评估产物：

text

evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml

基线检查（始终包含）：

文件/契约存在性检查
lint/类型/测试/构建命令检查
docker产物检查（
```
Dockerfile
```
、
```
docker-compose.yml
```
、镜像构建）

决策轨迹检查（

docs/DECISION_LOG.md

、

REVIEW_BUNDLE/DECISION_TRACE.md

）

检查失败时返回非零退出码
针对技能仓库：添加仓库本地检查，验证技能文件夹/前置元数据的命名
针对技能仓库：添加仓库本地检查，验证所需的决策策略描述符合要求

特定技能检查：

每个选中的技能对应一个检查文件
示例：
```
check_nostr_profile.sh
```
```
check_rag_ingest_query.sh
```
```
check_review_bundle.sh
```
```
check_decision_trace.sh
```
```
check_skill_repo_policy.sh
```

输出摘要：

将确定性评估运行摘要写入
```
REVIEW_BUNDLE/TEST_EVIDENCE.md
```
。

Required Template

必需模板

evals/deterministic/manifest.yaml

evals/deterministic/manifest.yaml

yaml

version: 1
checks:
  - id: contracts
    command: "bash evals/deterministic/checks/check_contracts.sh"
  - id: tests
    command: "bash evals/deterministic/checks/check_tests.sh"
  - id: build
    command: "bash evals/deterministic/checks/check_build.sh"
  - id: decision_trace
    command: "bash evals/deterministic/checks/check_decision_trace.sh"