addon-deterministic-eval-suite

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Add-on: Deterministic Eval Suite

插件:确定性评估套件

Use this skill when a project needs reproducible, merge-blocking evaluation checks.
当项目需要可复现的、会阻断合并的评估检查时使用此技能。

Compatibility

兼容性

  • Works with all
    architect-*
    scaffolds.
  • Recommended default for
    production-default
    mode.
  • 兼容所有
    architect-*
    脚手架。
  • production-default
    模式的推荐默认选项。

Inputs

输入参数

Collect:
  • EVAL_SCOPE
    :
    skill-only
    |
    project-only
    |
    both
    (default
    both
    ).
  • BLOCK_ON_FAIL
    :
    yes
    |
    no
    (default
    yes
    ).
  • RUN_DOCKER_CHECKS
    :
    yes
    |
    no
    (default
    yes
    for production-default).
收集:
  • EVAL_SCOPE
    skill-only
    |
    project-only
    |
    both
    (默认值为
    both
    )。
  • BLOCK_ON_FAIL
    yes
    |
    no
    (默认值为
    yes
    )。
  • RUN_DOCKER_CHECKS
    yes
    |
    no
    (在production-default模式下默认值为
    yes
    )。

Integration Workflow

集成工作流

  1. Add deterministic eval artifacts:
text
evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml
  1. Baseline checks (always include):
  • file/contract existence checks
  • lint/type/test/build command checks
  • docker artifact checks (
    Dockerfile
    ,
    docker-compose.yml
    , image build)
  • decision trace checks (
    docs/DECISION_LOG.md
    ,
    REVIEW_BUNDLE/DECISION_TRACE.md
    )
  • non-zero exit on failure
  • for skills repositories: add repository-local checks that validate skill folder/frontmatter naming
  • for skills repositories: add repository-local checks that validate required decision-policy language
  1. Skill-specific checks:
  • one check file per selected skill
  • examples:
  • check_nostr_profile.sh
  • check_rag_ingest_query.sh
  • check_review_bundle.sh
  • check_decision_trace.sh
  • check_skill_repo_policy.sh
  1. Output summary:
  • write deterministic run summary to
    REVIEW_BUNDLE/TEST_EVIDENCE.md
    .
  1. 添加确定性评估产物:
text
evals/deterministic/manifest.yaml
evals/deterministic/run.sh
evals/deterministic/checks/
.github/workflows/evals-deterministic.yml
  1. 基线检查(始终包含):
  • 文件/契约存在性检查
  • lint/类型/测试/构建命令检查
  • docker产物检查(
    Dockerfile
    docker-compose.yml
    、镜像构建)
  • 决策轨迹检查(
    docs/DECISION_LOG.md
    REVIEW_BUNDLE/DECISION_TRACE.md
  • 检查失败时返回非零退出码
  • 针对技能仓库:添加仓库本地检查,验证技能文件夹/前置元数据的命名
  • 针对技能仓库:添加仓库本地检查,验证所需的决策策略描述符合要求
  1. 特定技能检查:
  • 每个选中的技能对应一个检查文件
  • 示例:
  • check_nostr_profile.sh
  • check_rag_ingest_query.sh
  • check_review_bundle.sh
  • check_decision_trace.sh
  • check_skill_repo_policy.sh
  1. 输出摘要:
  • 将确定性评估运行摘要写入
    REVIEW_BUNDLE/TEST_EVIDENCE.md

Required Template

必需模板

evals/deterministic/manifest.yaml

evals/deterministic/manifest.yaml

yaml
version: 1
checks:
  - id: contracts
    command: "bash evals/deterministic/checks/check_contracts.sh"
  - id: tests
    command: "bash evals/deterministic/checks/check_tests.sh"
  - id: build
    command: "bash evals/deterministic/checks/check_build.sh"
  - id: decision_trace
    command: "bash evals/deterministic/checks/check_decision_trace.sh"
yaml
version: 1
checks:
  - id: contracts
    command: "bash evals/deterministic/checks/check_contracts.sh"
  - id: tests
    command: "bash evals/deterministic/checks/check_tests.sh"
  - id: build
    command: "bash evals/deterministic/checks/check_build.sh"
  - id: decision_trace
    command: "bash evals/deterministic/checks/check_decision_trace.sh"

Guardrails

防护规则

  • Documentation contract for generated code:
    • Python: write module docstrings and docstrings for public classes, methods, and functions.
    • Next.js/TypeScript: write JSDoc for exported components, hooks, utilities, and route handlers.
    • Add concise rationale comments only for non-obvious logic, invariants, or safety constraints.
    • Apply this contract even when using template snippets below; expand templates as needed.
  • Deterministic evals are source-of-truth merge gates.
  • Avoid network-dependent assertions unless explicitly required.
  • Keep commands idempotent and non-destructive.
  • Fail closed: missing required checks must fail the run.
  • Treat missing decision rationale artifacts as deterministic failure.
  • 生成代码的文档契约:
    • Python:为模块、公共类、方法和函数编写docstring。
    • Next.js/TypeScript:为导出的组件、hooks、工具函数和路由处理函数编写JSDoc。
    • 仅为非显而易见的逻辑、不变量或安全约束添加简洁的原理说明注释。
    • 即使使用下方的模板片段也要遵守此契约,可根据需要扩展模板。
  • 确定性评估是权威的合并关卡。
  • 除非明确要求,否则避免依赖网络的断言。
  • 保持命令幂等且无破坏性。
  • 安全失败原则:缺失必需的检查时必须判定运行失败。
  • 将缺失决策依据产物的情况判定为确定性失败。

Validation Checklist

验证检查清单

  • Confirm generated code includes required docstrings/JSDoc and rationale comments for non-obvious logic.
bash
test -f evals/deterministic/manifest.yaml
test -f evals/deterministic/run.sh
test -f .github/workflows/evals-deterministic.yml
bash evals/deterministic/run.sh
  • 确认生成的代码包含要求的docstrings/JSDoc,以及为非显而易见的逻辑添加的原理说明注释。
bash
test -f evals/deterministic/manifest.yaml
test -f evals/deterministic/run.sh
test -f .github/workflows/evals-deterministic.yml
bash evals/deterministic/run.sh

Decision Justification Rule

决策依据规则

  • Every non-trivial decision must include a concrete justification.
  • Capture the alternatives considered and why they were rejected.
  • State tradeoffs and residual risks for the chosen option.
  • If justification is missing, treat the task as incomplete and surface it as a blocker.
  • 每个重要决策都必须包含具体的依据。
  • 记录考虑过的替代方案以及被拒绝的原因。
  • 说明所选方案的权衡和残余风险。
  • 如果缺失依据,判定任务未完成,并将其标记为阻塞项。