Task: Test Feature & Generate Demo Report
Analyze the current branch's changes, scope the coverage level with the user, build an explicit testing strategy, start the app's dev server, test the feature in a real browser using
, capture screenshots and video (
always — no exceptions), upload everything to S3, and generate a structured test report (
always — never skip).
Phase 0: Environment Setup
1. Get artifact context IDs
Use the active task context when it is present in the prompt. Otherwise:
- The task ID is available from the environment variable.
- The conversation ID is available from the environment variable.
- Resolve with Aiden MCP context/tools before creating the report.
When calling
, always pass
,
, and
explicitly. Do not substitute one ID for another.
2. Verify agent-browser is available
is a
CLI tool by Vercel Labs (
) — it is NOT a skill or MCP tool. It provides headless browser automation via bash commands (open, click, fill, screenshot, record). In sandboxes it is pre-installed by
.
bash
command -v agent-browser >/dev/null 2>&1 && echo "agent-browser: OK" || echo "agent-browser: MISSING"
If missing, install it:
bash
npm install -g agent-browser && agent-browser install
If installation fails (e.g. no network, no npm), STOP and tell the user:
"agent-browser is not available and could not be installed. Install it manually:
npm install -g agent-browser && agent-browser install
"
Do NOT proceed to Phase 2 without a working
— all browser testing depends on it.
3. Create workspace
All captures (screenshots, videos) MUST be written to
.
bash
mkdir -p /tmp/aiden-captures
CRITICAL — file path rules:
- ONLY write capture files to . Never anywhere else.
- NEVER write to , project directories, , or any path inside the repo.
- is a sensitive system directory — writing to it will be blocked and will abort the test run.
- If you are tempted to create a or folder anywhere other than , stop and use instead.
Phase 0.5: Coverage Scoping & Context Gathering (MANDATORY — do before anything else)
Never skip this phase. You must understand what the user wants before writing a single test step.
1. Ask about coverage level
What level of test coverage do you want for this feature?
- light — Happy path only. Quick smoke test to verify the main flow works.
- standard — Happy path + key edge cases + basic error states. (Default)
- comprehensive — Full coverage: happy path, all edge cases, error states, non-happy paths, boundary values, accessibility, responsiveness.
Also: are there any specific flows, known bugs, or risky areas you want me to focus on?
Wait for the response before continuing.
2. Ask targeted follow-up questions
Based on the feature (which you may not know yet — do a quick
first to get a hint), use
to ask 1–3 targeted follow-up questions. Examples:
- "Is there an authenticated state I should test? If yes, what test credentials should I use?"
- "Are there any known edge cases or previous bugs related to this feature?"
- "What's the definition of 'working correctly' for this feature — what should I see?"
- "Are there any specific non-happy paths you're concerned about (e.g. invalid input, network errors, empty states)?"
- "Any pages or user roles I should specifically include or exclude?"
Do not skip questions if context is unclear. Ask. A well-scoped test is 10× more valuable than a blind one.
3. Confirm the plan
After gathering answers, summarize back to the user (no tool call needed — just a short message):
- Coverage level chosen
- Specific areas / flows to focus on
- Any known risks or edge cases to target
- Rough count of test scenarios expected
Only proceed to Phase 1 after this confirmation.
Phase 1: Discover What to Test
1. Analyze the branch
Run these commands to understand what changed:
bash
git log main..HEAD --oneline 2>/dev/null || git log HEAD~5..HEAD --oneline
git diff main...HEAD --stat 2>/dev/null || git diff HEAD~1 --stat
Read the actual diff to understand the feature or bug fix. Identify:
- What part of the app is affected (which pages, components, API routes)
- What the expected behavior change is
- What URL path to navigate to for testing
2. Discover and start the dev server
Look at the project to figure out how to run it:
- Read — check , ,
- Check for / /
- Check for (look for or targets)
- Check for , , , ,
- Check for framework-specific files: , , , , ,
Start the dev server in the background. Common patterns:
bash
# Node.js
npm run dev &
# or: pnpm dev &, yarn dev &, npx next dev &, npx vite &
# Python
python manage.py runserver &
# or: flask run &, uvicorn main:app &
# Ruby
bundle exec rails server &
# Docker
docker compose up -d
3. Wait for the server
Poll until the server is responding:
bash
# Replace PORT with the discovered port
for i in $(seq 1 30); do
curl -sf http://localhost:PORT >/dev/null 2>&1 && break
sleep 2
done
Check common ports if unclear: 3000, 5173, 8080, 4200, 8000, 4000, 3001, 8888.
4. Determine the app URL
- Parse dev server stdout/stderr for "Local:" or "ready on" messages with URLs
- Check or for or or similar
- Test common ports with
curl -sf http://localhost:PORT >/dev/null
- If you cannot determine it, ask the user via
5. Summarize before proceeding
Tell the user:
- What you found in the diff (feature/fix summary)
- What URL you will test
- What test scenarios you plan to cover
Phase 1.5: Build Testing Strategy & Create Todos (MANDATORY)
Before opening the browser, you must have an explicit plan. Do not improvise test steps on the fly.
1. Draft the testing strategy
Based on:
- The coverage level the user chose in Phase 0.5
- The diff analysis from Phase 1
- The context gathered from the user
Write out the full test plan as a structured list. For each scenario, note:
- Scenario name — short label (e.g. "Happy path: create item")
- Type — happy path / non-happy path / edge case / error state / visual / navigation
- Steps — what to do
- Expected outcome — what "pass" looks like
- Screenshot needed — yes/no
Coverage requirements by level:
- light: happy path only (1–3 scenarios)
- standard: happy path + 2–4 non-happy paths + 1–2 error states
- comprehensive: happy path + all non-happy paths + all error states + boundary values + empty states + responsiveness + navigation
Non-happy path examples to consider:
- Invalid / missing required inputs
- Submitting with no data / empty state
- Duplicate entries (if applicable)
- Permission denied / unauthorized access
- Network error / API failure simulation (if testable)
- Rapid repeated actions (double-click, spam submit)
- Long input strings / special characters
- Back button / browser navigation mid-flow
2. Create todos for each scenario
Use
to create a todo entry for each test scenario so progress is tracked. Each todo should be the scenario name.
3. Present the strategy
Output the full testing strategy as a numbered list before proceeding. The user should be able to see exactly what will be tested before Phase 2 starts.
Phase 2: Test with agent-browser
is a CLI tool — invoke it via bash, not as an MCP tool or skill.
Docs:
https://github.com/vercel-labs/agent-browser
Key commands:
,
,
,
,
,
,
,
,
1. Open the app
bash
agent-browser --session test-feature open http://localhost:PORT
agent-browser --session test-feature wait --load networkidle
2. Start video recording — MANDATORY
Always start a video recording before any interaction. No exceptions.
bash
agent-browser --session test-feature record start /tmp/aiden-captures/happy-path.webm
If video recording fails for technical reasons, note the failure but continue — screenshots are still required.
3. Take initial screenshot — MANDATORY
Always capture the initial page state before any interaction.
bash
agent-browser --session test-feature screenshot /tmp/aiden-captures/step-00-initial-state.png
4. Execute each scenario from the testing strategy
Work through every scenario defined in Phase 1.5. For each scenario:
- Mark the corresponding todo as in-progress
- Use
agent-browser snapshot -i
to discover interactive elements
- Use ,
agent-browser fill @eN "text"
, etc. to interact
- Use
agent-browser wait --load networkidle
or between actions
- Take a screenshot after every significant state change — never go more than 2 meaningful actions without a screenshot:
bash
agent-browser --session test-feature screenshot /tmp/aiden-captures/step-NN-description.png
Name screenshots descriptively: , step-03-submit-clicked.png
, step-04-success-state.png
- Re-snapshot after navigation or DOM changes (refs go stale)
- If elements are hard to find by ref, use semantic locators:
bash
agent-browser --session test-feature find text "Submit" click
agent-browser --session test-feature find role button click --name "Save"
- Mark each todo as complete or failed based on outcome
Mandatory coverage checklist (execute ALL that apply to the coverage level chosen):
Happy path (always required):
Non-happy paths (required for standard + comprehensive):
Error states (required for standard + comprehensive):
Visual & navigation (required for comprehensive):
5. Stop recording and clean up
bash
agent-browser --session test-feature record stop
agent-browser --session test-feature screenshot /tmp/aiden-captures/final-state.png
agent-browser --session test-feature close
Keep the recording under 2 minutes. If the feature requires more exploration, split into multiple recordings (e.g.
,
).
6. Fix WebM duration metadata (MANDATORY if ffmpeg is available)
WebM files recorded by agent-browser often have
in the container header — the video player then shows 0:00. Fix every
file by remuxing it through ffmpeg, which reads the entire file, computes the real duration, and writes it into the output header:
bash
for f in /tmp/aiden-captures/*.webm; do
if command -v ffmpeg >/dev/null 2>&1; then
ffmpeg -y -i "$f" -c copy "${f%.webm}-fixed.webm" 2>/dev/null \
&& mv "${f%.webm}-fixed.webm" "$f" \
|| echo "ffmpeg remux failed for $f — uploading as-is"
fi
done
If ffmpeg is not available, skip this step and continue — the video will still play, it just won't show the correct duration in the player.
Phase 3: Upload Captures to S3
For each captured file, use the
mcp__aiden__get_upload_url
MCP tool to get a presigned S3 URL, then
the file directly to S3.
Per-file upload flow
-
Get the file size (needed by the MCP tool):
bash
SIZE=$(stat -c%s "/tmp/aiden-captures/step-01.png" 2>/dev/null || stat -f%z "/tmp/aiden-captures/step-01.png")
-
Call the MCP tool to get a presigned upload URL:
Tool: mcp__aiden__get_upload_url
Parameters: {
"taskId": "<active task ID or value of AIDEN_TASK_ID>",
"filename": "step-01.png",
"mimeType": "image/png",
"size": <SIZE from step 1>
}
Returns:
{ "uploadUrl": "https://...", "s3Key": "sandbox-captures/..." }
-
Upload the file to S3 using the presigned URL:
bash
curl -sf -X PUT "<uploadUrl>" \
-H "Content-Type: image/png" \
--data-binary @/tmp/aiden-captures/step-01.png
-
Save the — you will pass it to
in Phase 4.
Repeat for each screenshot and video file. Common MIME types:
- Screenshots:
- Video recordings:
If uploads fail: Continue to Phase 4 anyway — omit the
,
, and
fields. The structured report is still valuable without media.
Phase 4: Create Structured Test Report (MANDATORY — NEVER SKIP)
CRITICAL: You MUST call mcp__aiden__create_test_report
to complete this skill. This is non-negotiable.
- Do NOT output the report as markdown text
- Do NOT summarize findings in chat only
- Do NOT skip this phase for any reason — not because of time, not because uploads failed, not because testing was partial
- The report MUST be persisted via this MCP tool call so the UI renders it as an interactive artifact
- Skipping this step means the entire test session is wasted and untracked
If uploads failed in Phase 3, create the report anyway — omit the media URLs but include all steps, issues, and summary text.
Call the
mcp__aiden__create_test_report
MCP tool with structured data from your testing.
Gather all the data from the previous phases and call the tool:
Tool: mcp__aiden__create_test_report
Parameters: {
"title": "<short description of what was tested>",
"taskId": "<active task ID or value of AIDEN_TASK_ID, if set>",
"conversationId": "<active conversation ID or value of AIDEN_SESSION_ID>",
"teamId": "<resolved team ID>",
"branch": "<current branch name from git>",
"baseBranch": "main",
"commits": [
{ "sha": "<commit sha>", "message": "<commit message>" }
],
"changedFiles": [
{ "path": "src/components/Feature.tsx", "description": "Added new feature component" }
],
"appUrl": "<the URL you tested>",
"summary": "<1-2 paragraph summary of what was tested and the outcome>",
"status": "pass | fail | mixed",
"steps": [
{
"name": "Navigate to feature page",
"status": "pass | fail | skip",
"screenshotUrl": "<s3Key from Phase 3 — e.g. sandbox-captures/org/task/step-01.png — omit if upload failed>",
"notes": "Page loaded correctly"
}
],
"issues": [
{
"title": "Button misaligned on mobile",
"severity": "critical | high | medium | low",
"description": "The submit button overflows on viewports < 375px",
"screenshotUrl": "<s3Key from Phase 3 showing the issue — omit if upload failed>"
}
],
"videoUrl": "<s3Key from Phase 3 for the happy-path recording — omit if upload failed>",
"screenshotUrls": ["<s3Key 1 from Phase 3>", "<s3Key 2 from Phase 3>"]
}
Field guidelines
- status: "pass" if all steps passed, "fail" if any critical step failed, "mixed" if some passed and some failed
- steps: one entry per distinct test action (navigate, click, fill, verify). Include the screenshot URL for that step if you took one.
- issues: only include actual problems found. Each issue should have a severity and clear description.
- screenshotUrls: flat list of ALL screenshot s3Keys from Phase 3 (for gallery display). Use the field returned by , not the . Omit if no uploads succeeded.
- commits and changedFiles: from git analysis in Phase 1
Output
After calling
mcp__aiden__create_test_report
, tell the user:
- The test report title, so the user can find the generated artifact in the Aiden UI
- A link to the artifact when you have enough context to form one (usually
/teams/<teamId>/docs?artifactId=<artifactId>
)
- A brief summary: what was tested, how many steps passed/failed, any issues found
- List any bugs or concerns discovered during testing
Do not lead with a raw UUID. Only include the artifact/report ID as secondary debug context if no usable title or link is available.
Do NOT render the full report as markdown. The MCP tool call creates the report as a structured artifact that the UI displays interactively.