vercel-plugin-eval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Plugin Eval

插件评估

Launch real Claude Code sessions with the plugin installed, monitor debug logs in real-time, and verify every hook fires correctly with proper dedup.
启动已安装该插件的真实 Claude Code 会话,实时监控调试日志,验证每个钩子都能正确触发并实现合理去重。

DO NOT (Hard Rules)

禁止行为(硬性规则)

  • DO NOT use
    claude --print
    or
    -p
    — hooks don't fire, no files created
  • DO NOT use
    --dangerously-skip-permissions
  • DO NOT create projects in
    /tmp/
    — always use
    ~/dev/vercel-plugin-testing/
  • DO NOT manually wire hooks or create
    settings.local.json
    — use
    npx add-plugin
  • DO NOT set
    CLAUDE_PLUGIN_ROOT
    manually
  • DO NOT use
    bash -c
    in WezTerm — use
    /bin/zsh -ic
  • DO NOT use full path to claude — use the
    x
    alias
  • DO NOT write eval scripts — do everything as Bash tool calls in the conversation
Copy the exact commands below. Do not improvise.
  • 禁止使用
    claude --print
    -p
    —— 钩子不会触发,也不会创建文件
  • 禁止使用
    --dangerously-skip-permissions
  • 禁止
    /tmp/
    下创建项目 —— 始终使用
    ~/dev/vercel-plugin-testing/
  • 禁止手动配置钩子或创建
    settings.local.json
    —— 使用
    npx add-plugin
  • 禁止手动设置
    CLAUDE_PLUGIN_ROOT
  • 禁止在 WezTerm 中使用
    bash -c
    —— 使用
    /bin/zsh -ic
  • 禁止使用 claude 的完整路径 —— 使用
    x
    别名
  • 禁止编写评估脚本 —— 所有操作都通过会话中的 Bash 工具调用完成
请严格复制下方命令,请勿自行改动。

Quick Start

快速开始

Always append a timestamp to directory names so reruns don't overwrite old projects:
bash
undefined
请始终在目录名后追加时间戳,避免重新运行时覆盖旧项目:
bash
undefined

1. Create test dir & install plugin (with timestamp)

1. Create test dir & install plugin (with timestamp)

TS=$(date +%Y%m%d-%H%M) SLUG="my-eval-$TS" mkdir -p ~/dev/vercel-plugin-testing/$SLUG cd ~/dev/vercel-plugin-testing/$SLUG npx add-plugin https://github.com/vercel/vercel-plugin -s project -y
TS=$(date +%Y%m%d-%H%M) SLUG="my-eval-$TS" mkdir -p ~/dev/vercel-plugin-testing/$SLUG cd ~/dev/vercel-plugin-testing/$SLUG npx add-plugin https://github.com/vercel/vercel-plugin -s project -y

2. Launch session via WezTerm

2. Launch session via WezTerm

wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"
wezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"

3. Find debug log (wait ~25s for session start)

3. Find debug log (wait ~25s for session start)

find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +
undefined
find ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +
undefined

What to Monitor

监控内容

Hook firing (all 8 registered hooks)

钩子触发(全部8个已注册的钩子)

bash
LOG=~/.claude/debug/<session-id>.txt
bash
LOG=~/.claude/debug/<session-id>.txt

SessionStart (3 hooks)

SessionStart (3 hooks)

grep "SessionStart.*success" "$LOG"
grep "SessionStart.*success" "$LOG"

PreToolUse skill injection

PreToolUse skill injection

grep -c "executePreToolHooks" "$LOG" # total calls grep -c "provided additionalContext" "$LOG" # injections
grep -c "executePreToolHooks" "$LOG" # 总调用次数 grep -c "provided additionalContext" "$LOG" # 注入次数

UserPromptSubmit

UserPromptSubmit

grep "UserPromptSubmit.*success" "$LOG"
grep "UserPromptSubmit.*success" "$LOG"

PostToolUse validate + shadcn font-fix

PostToolUse validate + shadcn font-fix

grep "posttooluse-validate.*provided" "$LOG" grep "PostToolUse:Bash.*success" "$LOG"
grep "posttooluse-validate.*provided" "$LOG" grep "PostToolUse:Bash.*success" "$LOG"

SessionEnd cleanup

SessionEnd cleanup

grep "SessionEnd" "$LOG"
undefined
grep "SessionEnd" "$LOG"
undefined

Dedup correctness (the key metric)

去重正确性(核心指标)

bash
TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"
bash
TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"

Claim files = one per skill, atomic O_EXCL

Claim files = one per skill, atomic O_EXCL

ls "$CLAIMDIR"
ls "$CLAIMDIR"

Compare: injections should equal claims

Compare: injections should equal claims

inject_meta=$(grep -c "skillInjection:" "$LOG") claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ') echo "Injections: $((inject_meta / 3)) | Claims: $claims"

`skillInjection:` appears 3x per actual injection in the debug log (initial check, parsed, success). Divide by 3.
inject_meta=$(grep -c "skillInjection:" "$LOG") claims=$(ls "$CLAIMDIR" 2>/dev/null | wc -l | tr -d ' ') echo "Injections: $((inject_meta / 3)) | Claims: $claims"

调试日志中每次实际注入都会出现3次`skillInjection:`(初始检查、解析完成、成功),请除以3。

PostToolUse validate quality

PostToolUse 验证质量

Look for real catches — API key bypass, outdated models, wrong patterns:
bash
grep "VALIDATION" "$LOG" | head -10
查找实际捕获的问题 —— API密钥绕过、模型版本过旧、错误模式:
bash
grep "VALIDATION" "$LOG" | head -10

Scenario Design

场景设计

Describe products and features, never name specific technologies. Let the plugin infer which skills to inject. Always end prompts with: "Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."
描述产品和功能,不要提及具体技术。让插件自行推断需要注入哪些skill。提示词末尾请始终加上:"Link the project to my vercel-labs team so we can deploy it later. Skip any planning and just build it. Get the dev server running."

Coverage targets by scenario type

按场景类型划分的覆盖率目标

Scenario TypeSkills Exercised
AI chat appai-sdk, ai-gateway, nextjs, ai-elements
Durable workflowworkflow, ai-sdk, vercel-queues
Monorepoturborepo, turbopack, nextjs
Edge auth + routingrouting-middleware, auth, sign-in-with-vercel
Chat bot (multi-platform)chat-sdk, ai-sdk, vercel-storage
Feature flags + CRMvercel-flags, vercel-queues, ai-sdk
Email pipelineemail, satori, ai-sdk, vercel-storage
Marketplace/paymentspayments, marketplace, cms
Kitchen sinkmicro, ncc, all niche skills
场景类型覆盖的Skill
AI聊天应用ai-sdk, ai-gateway, nextjs, ai-elements
持久化工作流workflow, ai-sdk, vercel-queues
Monorepoturborepo, turbopack, nextjs
边缘鉴权 + 路由routing-middleware, auth, sign-in-with-vercel
聊天机器人(多平台)chat-sdk, ai-sdk, vercel-storage
功能开关 + CRMvercel-flags, vercel-queues, ai-sdk
邮件流水线email, satori, ai-sdk, vercel-storage
市场/支付payments, marketplace, cms
全量覆盖micro, ncc, 所有小众skill

Hard-to-trigger skills (8 of 44)

难以触发的Skill(44个中的8个)

These need explicit technology references in the prompt because agents don't naturally reach for them:
  • ai-elements
    — say "use the AI Elements component registry"
  • v0-dev
    — say "generate components with v0"
  • vercel-firewall
    — say "use Vercel Firewall for rate limiting"
  • marketplace
    — say "publish to the Vercel Marketplace"
  • geist
    — say "install the geist font package"
  • json-render
    — name files
    components/chat-*.tsx
这些需要在提示词中明确提及相关技术,因为agent不会主动选择使用它们:
  • ai-elements
    —— 说明"use the AI Elements component registry"
  • v0-dev
    —— 说明"generate components with v0"
  • vercel-firewall
    —— 说明"use Vercel Firewall for rate limiting"
  • marketplace
    —— 说明"publish to the Vercel Marketplace"
  • geist
    —— 说明"install the geist font package"
  • json-render
    —— 命名文件为
    components/chat-*.tsx

Coverage Report

覆盖率报告

Write results to
.notes/COVERAGE.md
with:
  1. Session index — slug, session ID, unique skills, dedup status
  2. Hook coverage matrix — which hooks fired in which sessions
  3. Skill injection table — which of the 44 skills triggered
  4. Dedup stats — injections vs claims per session
  5. Issues found — bugs, pattern gaps, validation findings
将结果写入
.notes/COVERAGE.md
,包含以下内容:
  1. 会话索引 —— slug、会话ID、唯一skill数量、去重状态
  2. 钩子覆盖矩阵 —— 各会话中哪些钩子触发了
  3. Skill注入表 —— 44个skill中哪些被触发了
  4. 去重统计 —— 每个会话的注入次数 vs 声明次数
  5. 发现的问题 —— bug、模式缺口、验证结果

Cleanup

清理

bash
rm -rf ~/dev/vercel-plugin-testing
bash
rm -rf ~/dev/vercel-plugin-testing