vercel-plugin-eval

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Plugin Eval

插件评估

Launch real Claude Code sessions with the plugin installed, monitor debug logs in real-time, and verify every hook fires correctly with proper dedup.

启动已安装该插件的真实 Claude Code 会话，实时监控调试日志，验证每个钩子都能正确触发并实现合理去重。

DO NOT (Hard Rules)

禁止行为（硬性规则）

DO NOT use
```
claude --print
```
or
```
-p
```
— hooks don't fire, no files created
DO NOT use
```
--dangerously-skip-permissions
```
DO NOT create projects in
```
/tmp/
```
— always use
```
~/dev/vercel-plugin-testing/
```
DO NOT manually wire hooks or create
```
settings.local.json
```
— use
```
npx add-plugin
```
DO NOT set
```
CLAUDE_PLUGIN_ROOT
```
manually
DO NOT use
```
bash -c
```
in WezTerm — use
```
/bin/zsh -ic
```
DO NOT use full path to claude — use the
```
x
```
alias
DO NOT write eval scripts — do everything as Bash tool calls in the conversation

Copy the exact commands below. Do not improvise.

禁止使用
```
claude --print
```
或
```
-p
```
—— 钩子不会触发，也不会创建文件
禁止使用
```
--dangerously-skip-permissions
```
禁止在
```
/tmp/
```
下创建项目 —— 始终使用
```
~/dev/vercel-plugin-testing/
```
禁止手动配置钩子或创建
```
settings.local.json
```
—— 使用
```
npx add-plugin
```
禁止手动设置
```
CLAUDE_PLUGIN_ROOT
```
禁止在 WezTerm 中使用
```
bash -c
```
—— 使用
```
/bin/zsh -ic
```
禁止使用 claude 的完整路径 —— 使用
```
x
```
别名
禁止编写评估脚本 —— 所有操作都通过会话中的 Bash 工具调用完成

请严格复制下方命令，请勿自行改动。

Quick Start

快速开始

Always append a timestamp to directory names so reruns don't overwrite old projects:

bash

undefined

请始终在目录名后追加时间戳，避免重新运行时覆盖旧项目：

bash

undefined

1. Create test dir & install plugin (with timestamp)

TS=$(date +%Y%m%d-%H%M) SLUG="my-eval-$TS" mkdir -p ~/dev/vercel-plugin-testing/$SLUG cd ~/dev/vercel-plugin-testing/$SLUG npx add-plugin https://github.com/vercel/vercel-plugin -s project -y

2. Launch session via WezTerm

wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"

3. Find debug log (wait ~25s for session start)

find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +

undefined

find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +

undefined

What to Monitor

监控内容

Hook firing (all 8 registered hooks)

钩子触发（全部8个已注册的钩子）

bash

LOG=~/.claude/debug/<session-id>.txt

bash

LOG=~/.claude/debug/<session-id>.txt

SessionStart (3 hooks)

grep "SessionStart.*success" "$LOG"

PreToolUse skill injection

grep -c "executePreToolHooks" "$LOG" # total calls grep -c "provided additionalContext" "$LOG" # injections

grep -c "executePreToolHooks" "$LOG" # 总调用次数 grep -c "provided additionalContext" "$LOG" # 注入次数

UserPromptSubmit

grep "UserPromptSubmit.*success" "$LOG"

PostToolUse validate + shadcn font-fix

grep "posttooluse-validate.*provided" "$LOG" grep "PostToolUse:Bash.*success" "$LOG"

SessionEnd cleanup

grep "SessionEnd" "$LOG"

undefined

grep "SessionEnd" "$LOG"

undefined

Dedup correctness (the key metric)

去重正确性（核心指标）

bash

TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"

bash

TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"

Claim files = one per skill, atomic O_EXCL

ls "$CLAIMDIR"

Compare: injections should equal claims

inject_meta=$(grep -c "skillInjection:" "$LOG") claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ') echo "Injections: $((inject_meta / 3)) | Claims: $claims"


`skillInjection:` appears 3x per actual injection in the debug log (initial check, parsed, success). Divide by 3.

inject_meta=$(grep -c "skillInjection:" "$LOG") claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ') echo "Injections: $((inject_meta / 3)) | Claims: $claims"


调试日志中每次实际注入都会出现3次`skillInjection:`（初始检查、解析完成、成功），请除以3。

PostToolUse validate quality

PostToolUse 验证质量

Look for real catches — API key bypass, outdated models, wrong patterns:

bash

grep "VALIDATION" "$LOG" | head -10

查找实际捕获的问题 —— API密钥绕过、模型版本过旧、错误模式：

bash

grep "VALIDATION" "$LOG" | head -10

Scenario Design

场景设计

Describe products and features, never name specific technologies. Let the plugin infer which skills to inject. Always end prompts with: "Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."

描述产品和功能，不要提及具体技术。让插件自行推断需要注入哪些skill。提示词末尾请始终加上："Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."

Coverage targets by scenario type

按场景类型划分的覆盖率目标

Scenario Type	Skills Exercised
AI chat app	ai-sdk, ai-gateway, nextjs, ai-elements
Durable workflow	workflow, ai-sdk, vercel-queues
Monorepo	turborepo, turbopack, nextjs
Edge auth + routing	routing-middleware, auth, sign-in-with-vercel
Chat bot (multi-platform)	chat-sdk, ai-sdk, vercel-storage
Feature flags + CRM	vercel-flags, vercel-queues, ai-sdk
Email pipeline	email, satori, ai-sdk, vercel-storage
Marketplace/payments	payments, marketplace, cms
Kitchen sink	micro, ncc, all niche skills

场景类型	覆盖的Skill
AI聊天应用	ai-sdk, ai-gateway, nextjs, ai-elements
持久化工作流	workflow, ai-sdk, vercel-queues
Monorepo	turborepo, turbopack, nextjs
边缘鉴权 + 路由	routing-middleware, auth, sign-in-with-vercel
聊天机器人（多平台）	chat-sdk, ai-sdk, vercel-storage
功能开关 + CRM	vercel-flags, vercel-queues, ai-sdk
邮件流水线	email, satori, ai-sdk, vercel-storage
市场/支付	payments, marketplace, cms
全量覆盖	micro, ncc, 所有小众skill

Hard-to-trigger skills (8 of 44)

难以触发的Skill（44个中的8个）

These need explicit technology references in the prompt because agents don't naturally reach for them:

```
ai-elements
```
— say "use the AI Elements component registry"
```
v0-dev
```
— say "generate components with v0"
```
vercel-firewall
```
— say "use Vercel Firewall for rate limiting"
```
marketplace
```
— say "publish to the Vercel Marketplace"
```
geist
```
— say "install the geist font package"
```
json-render
```
— name files
```
components/chat-*.tsx
```

这些需要在提示词中明确提及相关技术，因为agent不会主动选择使用它们：

```
ai-elements
```
—— 说明"use the AI Elements component registry"
```
v0-dev
```
—— 说明"generate components with v0"
```
vercel-firewall
```
—— 说明"use Vercel Firewall for rate limiting"
```
marketplace
```
—— 说明"publish to the Vercel Marketplace"
```
geist
```
—— 说明"install the geist font package"
```
json-render
```
—— 命名文件为
```
components/chat-*.tsx
```

Coverage Report

覆盖率报告

Write results to

.notes/COVERAGE.md

with:

Session index — slug, session ID, unique skills, dedup status
Hook coverage matrix — which hooks fired in which sessions
Skill injection table — which of the 44 skills triggered
Dedup stats — injections vs claims per session
Issues found — bugs, pattern gaps, validation findings

将结果写入

.notes/COVERAGE.md

，包含以下内容：

会话索引 —— slug、会话ID、唯一skill数量、去重状态
钩子覆盖矩阵 —— 各会话中哪些钩子触发了
Skill注入表 —— 44个skill中哪些被触发了
去重统计 —— 每个会话的注入次数 vs 声明次数
发现的问题 —— bug、模式缺口、验证结果

Cleanup

清理

bash

rm -rf ~/dev/vercel-plugin-testing

bash

rm -rf ~/dev/vercel-plugin-testing