Loading...
Loading...
Advanced AI agent benchmark scenarios that push Vercel's cutting-edge platform features — Workflow DevKit, AI Gateway, MCP, Chat SDK, Queues, Flags, Sandbox, and multi-agent orchestration. Designed to stress-test skill injection for complex, multi-system builds.
npx skill4agent add vercel/vercel-plugin benchmark-agentswezterm cli spawn/releaseclaude --printBun.spawn(["claude", ...])--printsession_idclaude --print-p--dangerously-skip-permissions/tmp/~/dev/vercel-plugin-testing/settings.local.jsonnpx add-pluginCLAUDE_PLUGIN_ROOTbash -cbash -lc/bin/zsh -icxdebug.log~/.claude/debug/git initpackage.jsonnpx add-pluginTcreate-next-app<slug>-<yyyymmdd>-<hhmm>tarot-card-deck-20260309-1227interior-designer-20260309-1227date +%Y%m%d-%H%MTS=$(date +%Y%m%d-%H%M)
SLUG="my-app-$TS"
mkdir -p ~/dev/vercel-plugin-testing/$SLUG
cd ~/dev/vercel-plugin-testing/$SLUG
npx add-plugin https://github.com/vercel/vercel-plugin -s project -ywezterm cli spawn --cwd /Users/johnlindquist/dev/vercel-plugin-testing/$SLUG -- /bin/zsh -ic \
"unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '<PROMPT>' --settings .claude/settings.json; exec zsh"unset CLAUDECODEVERCEL_PLUGIN_LOG_LEVEL=debug~/.claude/debug/xclaude--settings .claude/settings.jsonfind ~/.claude/debug -name "*.txt" -mmin -2 -exec grep -l "$SLUG" {} +TS=$(date +%Y%m%d-%H%M)
cd ~/dev/vercel-plugin-testing
for name in tarot-deck interior-designer superhero-origin; do
d="${name}-${TS}"
mkdir -p "$d" && (cd "$d" && npx add-plugin https://github.com/vercel/vercel-plugin -s project -y)
done
# Then spawn each (these run in separate terminal panes)
wezterm cli spawn --cwd .../tarot-deck-$TS -- /bin/zsh -ic "unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '...' --settings .claude/settings.json; exec zsh"
wezterm cli spawn --cwd .../interior-designer-$TS -- /bin/zsh -ic "unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '...' --settings .claude/settings.json; exec zsh"
wezterm cli spawn --cwd .../superhero-origin-$TS -- /bin/zsh -ic "unset CLAUDECODE; VERCEL_PLUGIN_LOG_LEVEL=debug x '...' --settings .claude/settings.json; exec zsh"TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module)
CLAIMDIR="$TMPDIR/vercel-plugin-<session-id>-seen-skills.d"
# List all injected skills
ls "$CLAIMDIR"
# Count
ls "$CLAIMDIR" | wc -l
# Check specific skill
ls "$CLAIMDIR/workflow" && echo "YES" || echo "NO"LOG=~/.claude/debug/<session-id>.txt
# SessionStart hooks
grep -c 'SessionStart.*success' "$LOG"
# PreToolUse calls and injections
grep -c 'executePreToolHooks' "$LOG" # total calls
grep -c 'provided additionalContext' "$LOG" # actual injections
# PostToolUse validation catches
grep 'VALIDATION' "$LOG" | head -10
# UserPromptSubmit
grep -c 'UserPromptSubmit.*success' "$LOG"TMPDIR=$(node -e "import {tmpdir} from 'os'; console.log(tmpdir())" --input-type=module 2>/dev/null)
for label_id in "slug1:SESSION_ID_1" "slug2:SESSION_ID_2" "slug3:SESSION_ID_3"; do
label="${label_id%%:*}"
id="${label_id##*:}"
claimdir="$TMPDIR/vercel-plugin-$id-seen-skills.d"
echo "=== $label ==="
count=$(ls "$claimdir" 2>/dev/null | wc -l | tr -d ' ')
claims=$(ls "$claimdir" 2>/dev/null | sort | tr '\n' ', ')
echo "Skills ($count): $claims"
doneecho -n "src/: "; test -d "$base/src" && echo YES || echo NO # Should be NO for WDK projects
echo -n "workflows/: "; test -d "$base/workflows" && echo YES || echo NO
echo -n "withWorkflow: "; grep -q "withWorkflow" "$base"/next.config.* && echo YES || echo NO
echo -n "components.json: "; test -f "$base/components.json" && echo YES || echo NO# Should use gemini-3.1-flash-image-preview, NOT dall-e-3 or older gemini models
grep -rn "gemini.*image\|dall-e\|experimental_generateImage\|result\.files" "$base/workflows/" "$base/app/" 2>/dev/null | grep "\.ts"# Should use gateway() or plain "provider/model" strings, NOT openai("gpt-4o") directly
grep -rn "from.*@ai-sdk/openai\|openai(" "$base" 2>/dev/null | grep "\.ts" | grep -v node_modules
grep -rn "gateway(\|model:.*\"openai/" "$base" 2>/dev/null | grep "\.ts" | grep -v node_modulesfind "$base" -path "*/ai-elements/*.tsx" 2>/dev/null | grep -v node_modules | wc -lwf=$(find "$base" -name "*.ts" -path "*/workflow*" 2>/dev/null | grep -v node_modules | head -1)
head -5 "$wf" # Should show: import { getWritable } from "workflow"create-next-app| Issue | Cause | Plugin Fix (version) |
|---|---|---|
| Workflow not triggered from natural language | promptSignals too narrow | Broadened phrases, lowered minScore 6→4 (v0.9.5) |
Agent uses | Agent's training data defaults to openai | PostToolUse validate warns "your knowledge is outdated" (v0.9.9) |
Agent uses | Agent doesn't know about gemini image gen | PostToolUse validate warns, capabilities table in ai-sdk (v0.9.7) |
Agent uses | Old API | PostToolUse validate warns, recommend |
Raw markdown rendering ( | Agent skips AI Elements | |
| Workflows outside | Canonical structure docs: no |
| Agent skipped setup step | Marked as "Required" in workflow skill (v0.8.1) |
| Agent didn't wire the 3-piece pattern | Documented as 3 required pieces (v0.9.3) |
| Agent's training data | PostToolUse validate catches as error (v0.9.3) |
| Sandbox violation | Strengthened warning in skill (v0.8.1) |
Missing | No OIDC credentials | Added as "Required" setup step (v0.9.1) |
| WDK quirk | Documented: guard with |
| shadcn not installed | No trigger for scaffolding | Added |
| Skill cap too low (3) | Only 3 skills injected per tool call | Raised to 5 with 18KB budget (v0.8.0) |
agent-browser open http://localhost:<port>
agent-browser wait --load networkidle
agent-browser screenshot
agent-browser snapshot -i.notes/COVERAGE.mdbun run typecheck && bun test && bun run validatebun run build| # | Slug | Prompt Summary | Expected Skills |
|---|---|---|---|
| 01 | doc-qa-agent | PDF Q&A with embeddings, citations, multi-step reasoning | ai-sdk, nextjs, vercel-storage, ai-elements |
| 02 | customer-support-agent | Durable support agent, escalation, confidence tracking | ai-sdk, workflow, nextjs, ai-elements |
| 03 | deploy-monitor | Uptime monitoring, AI incident responder, durable investigation | workflow, cron-jobs, observability, ai-sdk |
| 04 | multi-model-router | Side-by-side model comparison, parallel streaming, cost tracking | ai-gateway, ai-sdk, nextjs, ai-elements |
| 05 | slack-pr-reviewer | Multi-platform chat bot, PR review, threaded conversations | chat-sdk, ai-sdk, nextjs |
| 06 | content-pipeline | Durable multi-step content production with image generation | workflow, ai-sdk, satori, nextjs |
| 07 | feature-rollout | Feature flags, A/B testing, AI experiment analysis | vercel-flags, ai-sdk, nextjs |
| 08 | event-driven-crm | Event-driven CRM, churn prediction, re-engagement emails | vercel-queues, workflow, ai-sdk, email |
| 09 | code-sandbox-tutor | AI coding tutor with sandbox execution, auto-fix | vercel-sandbox, ai-sdk, nextjs, ai-elements |
| 10 | multi-agent-research | Parallel sub-agents, durable orchestration, streaming synthesis | workflow, ai-sdk, ai-elements, nextjs |
| 11 | discord-game-master | RPG bot, persistent game state, scene illustration generation | chat-sdk, ai-sdk, vercel-storage, nextjs |
| 12 | compliance-auditor | Scheduled AI audits, durable approval workflow, deploy blocking | workflow, cron-jobs, ai-sdk, vercel-firewall |
--quickrm -rf ~/dev/vercel-plugin-testing