bare-eval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBare Eval — Isolated Evaluation Calls
Bare Eval — 隔离式评估调用
Run for fast, clean eval/grading without plugin overhead.
claude -p --bareCC 2.1.81 required. The flag skips hooks, LSP, plugin sync, and skill directory walks.
--bare运行,以在无插件开销的情况下快速、干净地完成评估/评分。
claude -p --bare需要CC 2.1.81版本。 标志会跳过钩子、LSP、插件同步以及skill目录遍历。
--bareWhen to Use
适用场景
- Grading skill outputs against assertions
- Trigger classification (which skill matches a prompt)
- Description optimization iterations
- Any scripted call that doesn't need plugins
-p
- 根据断言评分skill输出
- 触发器分类(判断哪个skill匹配提示词)
- 描述优化迭代
- 任何无需插件的脚本化调用
-p
When NOT to Use
不适用场景
- Testing skill routing (needs )
--plugin-dir - Testing agent orchestration (needs full plugin context)
- Interactive sessions
- 测试skill路由(需要)
--plugin-dir - 测试Agent编排(需要完整插件上下文)
- 交互式会话
Prerequisites
前置条件
bash
undefinedbash
undefined--bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled)
--bare模式需要ANTHROPIC_API_KEY(OAuth/钥匙串功能已禁用)
export ANTHROPIC_API_KEY="sk-ant-..."
export ANTHROPIC_API_KEY="sk-ant-..."
Verify CC version
验证CC版本
claude --version # Must be >= 2.1.81
undefinedclaude --version # 版本必须 >= 2.1.81
undefinedQuick Reference
快速参考
| Call Type | Command Pattern |
|---|---|
| Grading | |
| Trigger | |
| Optimize | |
| Force-skill | |
| 调用类型 | 命令格式 |
|---|---|
| 评分 | |
| 触发器 | |
| 优化 | |
| 强制指定skill | |
Invocation Patterns
调用模式
Load detailed patterns and examples:
Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")加载详细模式与示例:
Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")Grading Schemas
评分Schema
JSON schemas for structured eval output:
Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")用于结构化评估输出的JSON Schema:
Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")Pipeline Integration
流水线集成
OrchestKit's eval scripts () auto-detect bare mode:
npm run eval:skillbash
undefinedOrchestKit的评估脚本()会自动检测bare模式:
npm run eval:skillbash
undefinedeval-common.sh detects ANTHROPIC_API_KEY → sets BARE_MODE=true
eval-common.sh会检测ANTHROPIC_API_KEY → 设置BARE_MODE=true
Scripts add --bare to all non-plugin calls automatically
脚本会自动为所有非插件调用添加--bare参数
**Bare calls:** Trigger classification, force-skill, baseline, all grading.
**Never bare:** `run_with_skill` (needs plugin context for routing tests).
**Bare模式调用场景:** 触发器分类、强制指定skill、基准测试、所有评分场景。
**禁止使用Bare模式场景:** `run_with_skill`(路由测试需要插件上下文)。Performance
性能对比
| Scenario | Without --bare | With --bare | Savings |
|---|---|---|---|
| Single grading call | ~3-5s startup | ~0.5-1s | 2-4x |
| Trigger (per prompt) | ~3-5s | ~0.5-1s | 2-4x |
| Full eval (50 calls) | ~150-250s overhead | ~25-50s | 3-5x |
| 场景 | 不使用--bare | 使用--bare | 性能提升 |
|---|---|---|---|
| 单次评分调用 | ~3-5秒启动时间 | ~0.5-1秒 | 2-4倍 |
| 触发器测试(单提示词) | ~3-5秒 | ~0.5-1秒 | 2-4倍 |
| 完整评估(50次调用) | ~150-250秒开销 | ~25-50秒 | 3-5倍 |
Rules
规则
Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")Troubleshooting
故障排查
Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")Related
相关资源
- npm script — unified skill evaluation runner
eval:skill - — trigger accuracy testing
eval:trigger - — A/B quality comparison
eval:quality - — iterative description improvement
optimize-description.sh - Version compatibility:
doctor/references/version-compatibility.md
- npm脚本 — 统一的skill评估运行器
eval:skill - — 触发器准确性测试
eval:trigger - — A/B质量对比
eval:quality - — 描述迭代优化
optimize-description.sh - 版本兼容性:
doctor/references/version-compatibility.md