vercel-plugin-eval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePlugin Eval
插件评估
Launch real Claude Code sessions with the plugin installed, monitor debug logs in real-time, and verify every hook fires correctly with proper dedup.
启动已安装该插件的真实 Claude Code 会话,实时监控调试日志,验证每个钩子都能正确触发并实现合理去重。
DO NOT (Hard Rules)
禁止行为(硬性规则)
- DO NOT use or
claude --print— hooks don't fire, no files created-p - DO NOT use
--dangerously-skip-permissions - DO NOT create projects in — always use
/tmp/~/dev/vercel-plugin-testing/ - DO NOT manually wire hooks or create — use
settings.local.jsonnpx add-plugin - DO NOT set manually
CLAUDE_PLUGIN_ROOT - DO NOT use in WezTerm — use
bash -c/bin/zsh -ic - DO NOT use full path to claude — use the alias
x - DO NOT write eval scripts — do everything as Bash tool calls in the conversation
Copy the exact commands below. Do not improvise.
- 禁止使用 或
claude --print—— 钩子不会触发,也不会创建文件-p - 禁止使用
--dangerously-skip-permissions - 禁止在 下创建项目 —— 始终使用
/tmp/~/dev/vercel-plugin-testing/ - 禁止手动配置钩子或创建 —— 使用
settings.local.jsonnpx add-plugin - 禁止手动设置
CLAUDE_PLUGIN_ROOT - 禁止在 WezTerm 中使用 —— 使用
bash -c/bin/zsh -ic - 禁止使用 claude 的完整路径 —— 使用 别名
x - 禁止编写评估脚本 —— 所有操作都通过会话中的 Bash 工具调用完成
请严格复制下方命令,请勿自行改动。
Quick Start
快速开始
Always append a timestamp to directory names so reruns don't overwrite old projects:
bash
undefined请始终在目录名后追加时间戳,避免重新运行时覆盖旧项目:
bash
undefined1. Create test dir & install plugin (with timestamp)
1. Create test dir & install plugin (with timestamp)
TS=$(date +%Y%m%d-%H%M)
SLUG="my-eval-$TS"
mkdir -p ~/dev/vercel-plugin-testing/$SLUG
cd ~/dev/vercel-plugin-testing/$SLUG
npx add-plugin https://github.com/vercel/vercel-plugin -s project -y
TS=$(date +%Y%m%d-%H%M)
SLUG="my-eval-$TS"
mkdir -p ~/dev/vercel-plugin-testing/$SLUG
cd ~/dev/vercel-plugin-testing/$SLUG
npx add-plugin https://github.com/vercel/vercel-plugin -s project -y
2. Launch session via WezTerm
2. Launch session via WezTerm
wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"
wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"
3. Find debug log (wait ~25s for session start)
3. Find debug log (wait ~25s for session start)
find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +
undefinedfind ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +
undefinedWhat to Monitor
监控内容
Hook firing (all 8 registered hooks)
钩子触发(全部8个已注册的钩子)
bash
LOG=~/.claude/debug/<session-id>.txtbash
LOG=~/.claude/debug/<session-id>.txtSessionStart (3 hooks)
SessionStart (3 hooks)
grep "SessionStart.*success" "$LOG"
grep "SessionStart.*success" "$LOG"
PreToolUse skill injection
PreToolUse skill injection
grep -c "executePreToolHooks" "$LOG" # total calls
grep -c "provided additionalContext" "$LOG" # injections
grep -c "executePreToolHooks" "$LOG" # 总调用次数
grep -c "provided additionalContext" "$LOG" # 注入次数
UserPromptSubmit
UserPromptSubmit
grep "UserPromptSubmit.*success" "$LOG"
grep "UserPromptSubmit.*success" "$LOG"
PostToolUse validate + shadcn font-fix
PostToolUse validate + shadcn font-fix
grep "posttooluse-validate.*provided" "$LOG"
grep "PostToolUse:Bash.*success" "$LOG"
grep "posttooluse-validate.*provided" "$LOG"
grep "PostToolUse:Bash.*success" "$LOG"
SessionEnd cleanup
SessionEnd cleanup
grep "SessionEnd" "$LOG"
undefinedgrep "SessionEnd" "$LOG"
undefinedDedup correctness (the key metric)
去重正确性(核心指标)
bash
TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"bash
TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"Claim files = one per skill, atomic O_EXCL
Claim files = one per skill, atomic O_EXCL
ls "$CLAIMDIR"
ls "$CLAIMDIR"
Compare: injections should equal claims
Compare: injections should equal claims
inject_meta=$(grep -c "skillInjection:" "$LOG")
claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ')
echo "Injections: $((inject_meta / 3)) | Claims: $claims"
`skillInjection:` appears 3x per actual injection in the debug log (initial check, parsed, success). Divide by 3.inject_meta=$(grep -c "skillInjection:" "$LOG")
claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ')
echo "Injections: $((inject_meta / 3)) | Claims: $claims"
调试日志中每次实际注入都会出现3次`skillInjection:`(初始检查、解析完成、成功),请除以3。PostToolUse validate quality
PostToolUse 验证质量
Look for real catches — API key bypass, outdated models, wrong patterns:
bash
grep "VALIDATION" "$LOG" | head -10查找实际捕获的问题 —— API密钥绕过、模型版本过旧、错误模式:
bash
grep "VALIDATION" "$LOG" | head -10Scenario Design
场景设计
Describe products and features, never name specific technologies. Let the plugin infer which skills to inject. Always end prompts with: "Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."
描述产品和功能,不要提及具体技术。让插件自行推断需要注入哪些skill。提示词末尾请始终加上:"Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."
Coverage targets by scenario type
按场景类型划分的覆盖率目标
| Scenario Type | Skills Exercised |
|---|---|
| AI chat app | ai-sdk, ai-gateway, nextjs, ai-elements |
| Durable workflow | workflow, ai-sdk, vercel-queues |
| Monorepo | turborepo, turbopack, nextjs |
| Edge auth + routing | routing-middleware, auth, sign-in-with-vercel |
| Chat bot (multi-platform) | chat-sdk, ai-sdk, vercel-storage |
| Feature flags + CRM | vercel-flags, vercel-queues, ai-sdk |
| Email pipeline | email, satori, ai-sdk, vercel-storage |
| Marketplace/payments | payments, marketplace, cms |
| Kitchen sink | micro, ncc, all niche skills |
| 场景类型 | 覆盖的Skill |
|---|---|
| AI聊天应用 | ai-sdk, ai-gateway, nextjs, ai-elements |
| 持久化工作流 | workflow, ai-sdk, vercel-queues |
| Monorepo | turborepo, turbopack, nextjs |
| 边缘鉴权 + 路由 | routing-middleware, auth, sign-in-with-vercel |
| 聊天机器人(多平台) | chat-sdk, ai-sdk, vercel-storage |
| 功能开关 + CRM | vercel-flags, vercel-queues, ai-sdk |
| 邮件流水线 | email, satori, ai-sdk, vercel-storage |
| 市场/支付 | payments, marketplace, cms |
| 全量覆盖 | micro, ncc, 所有小众skill |
Hard-to-trigger skills (8 of 44)
难以触发的Skill(44个中的8个)
These need explicit technology references in the prompt because agents don't naturally reach for them:
- — say "use the AI Elements component registry"
ai-elements - — say "generate components with v0"
v0-dev - — say "use Vercel Firewall for rate limiting"
vercel-firewall - — say "publish to the Vercel Marketplace"
marketplace - — say "install the geist font package"
geist - — name files
json-rendercomponents/chat-*.tsx
这些需要在提示词中明确提及相关技术,因为agent不会主动选择使用它们:
- —— 说明"use the AI Elements component registry"
ai-elements - —— 说明"generate components with v0"
v0-dev - —— 说明"use Vercel Firewall for rate limiting"
vercel-firewall - —— 说明"publish to the Vercel Marketplace"
marketplace - —— 说明"install the geist font package"
geist - —— 命名文件为
json-rendercomponents/chat-*.tsx
Coverage Report
覆盖率报告
Write results to with:
.notes/COVERAGE.md- Session index — slug, session ID, unique skills, dedup status
- Hook coverage matrix — which hooks fired in which sessions
- Skill injection table — which of the 44 skills triggered
- Dedup stats — injections vs claims per session
- Issues found — bugs, pattern gaps, validation findings
将结果写入,包含以下内容:
.notes/COVERAGE.md- 会话索引 —— slug、会话ID、唯一skill数量、去重状态
- 钩子覆盖矩阵 —— 各会话中哪些钩子触发了
- Skill注入表 —— 44个skill中哪些被触发了
- 去重统计 —— 每个会话的注入次数 vs 声明次数
- 发现的问题 —— bug、模式缺口、验证结果
Cleanup
清理
bash
rm -rf ~/dev/vercel-plugin-testingbash
rm -rf ~/dev/vercel-plugin-testing