usability-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseUsability Testing
可用性测试
Scope
适用范围
Covers
- Designing task-based usability studies tied to a specific product decision
- Testing live flows, prototypes, and “faked” implementations (fake door, Wizard of Oz)
- Running moderated sessions (remote or in-person) and capturing high-quality evidence
- Turning findings into a prioritized fix list (including high-ROI microcopy/CTA improvements)
When to use
- “Create a usability test plan and script for <flow>.”
- “We need to test a prototype with 5–8 users next week.”
- “Validate a value proposition before building (fake door / Wizard of Oz).”
- “Help me synthesize usability findings into a prioritized backlog.”
When NOT to use
- You need statistically reliable estimates or causal impact (use analytics/experimentation)
- You need open-ended discovery (“what problems do users have?”) → use
conducting-user-interviews - You’re working with high-risk populations or sensitive topics (medical, legal, minors) without appropriate approvals/training
- You don’t have a concrete scenario/flow to evaluate (clarify the decision first)
涵盖内容
- 设计与特定产品决策挂钩的基于任务的可用性研究
- 测试真实流程、原型及“模拟”实现(假门测试、Wizard of Oz测试)
- 执行有主持的测试会话(远程或线下)并收集高质量证据
- 将研究结果转化为按优先级排序的修复清单(包括高投资回报率的微文案/CTA优化)
适用场景
- “为<流程>创建可用性测试计划和脚本。”
- “我们需要在下周对原型进行5-8位用户的测试。”
- “在开发前验证价值主张(假门测试/Wizard of Oz测试)。”
- “帮我将可用性研究结果整合为按优先级排序的待办事项。”
不适用场景
- 需要统计可靠的估算或因果影响分析(请使用数据分析/实验方法)
- 需要开放式探索(“用户存在哪些问题?”)→ 请使用(用户访谈)
conducting-user-interviews - 在未获得适当批准/培训的情况下,针对高风险人群或敏感主题(医疗、法律、未成年人)开展测试
- 没有明确的评估场景/流程(请先明确决策内容)
Inputs
输入要求
Minimum required
- Product + target user segment (who, context of use)
- The decision this test should inform (what will change) + timeline
- What you’re testing (flow/feature) + prototype/build link (or “recommend stimulus”)
- Platform + environment (web/mobile/desktop; remote/in-person)
- Constraints: session type, number of participants, incentives, recording policy, privacy constraints
Missing-info strategy
- Ask up to 5 questions from references/INTAKE.md.
- If still unknown, proceed with explicit assumptions and list Open questions that would change the plan.
最低必要信息
- 产品及目标用户群体(测试对象、使用场景)
- 本次测试要支撑的决策内容(会有哪些变更)+ 时间线
- 测试对象(流程/功能)+ 原型/产品链接(或“推荐测试素材”)
- 平台及环境(网页/移动/桌面;远程/线下)
- 约束条件:会话类型、参与者数量、激励措施、录制政策、隐私限制
缺失信息处理策略
- 可从references/INTAKE.md中选取最多5个问题进行询问。
- 若信息仍不明确,基于明确假设推进,并列出待确认问题,这些问题可能会改变测试计划。
Outputs (deliverables)
输出成果(交付物)
Produce a Usability Test Pack in Markdown (in-chat; or as files if requested):
- Context snapshot (decision, users, what’s being tested, constraints)
- Test plan (method, prototype strategy, hypotheses/risks, success criteria)
- Participant plan (criteria, recruiting channels, schedule + backups)
- Moderator guide + task script (neutral tasks, probes, wrap-up)
- Note-taking template + issue log (severity/impact, evidence)
- Synthesis readout (findings, prioritized issues, recommendations, quick wins)
- Risks / Open questions / Next steps (always included)
Templates: references/TEMPLATES.md
Expanded heuristics: references/WORKFLOW.md
Expanded heuristics: references/WORKFLOW.md
生成Markdown格式的可用性测试包(可在对话中直接提供;若有需求也可作为文件交付):
- 上下文快照(决策内容、测试用户、测试对象、约束条件)
- 测试计划(方法、原型策略、假设/风险、成功标准)
- 参与者计划(入选标准、招募渠道、日程安排及备选方案)
- 主持人指南+任务脚本(中立任务、追问问题、收尾环节)
- 记录模板+问题日志(严重程度/影响、证据)
- 研究结果整合报告(研究发现、按优先级排序的问题、建议、快速优化点)
- 风险/待确认问题/下一步行动(必须包含)
模板:references/TEMPLATES.md
扩展启发式方法:references/WORKFLOW.md
扩展启发式方法:references/WORKFLOW.md
Workflow (8 steps)
工作流程(8个步骤)
1) Frame the decision and the “why now”
1) 明确决策内容及“当前开展测试的原因”
- Inputs: User context; references/INTAKE.md.
- Actions: Define the decision, primary unknowns, and the minimum you need to learn to make the call.
- Outputs: Context snapshot + research questions/hypotheses.
- Checks: You can answer: “What will we do differently after this test?”
- 输入:用户上下文;references/INTAKE.md
- 行动:定义决策内容、核心未知项,以及做出决策所需了解的最低信息。
- 输出:上下文快照+研究问题/假设。
- 检查项:能够回答“测试结束后我们会做出哪些不同的决策?”
2) Choose the right stimulus (real vs prototype vs faked)
2) 选择合适的测试素材(真实产品/原型/模拟实现)
- Inputs: What’s being tested; constraints.
- Actions: Select the cheapest valid setup: live product, clickable prototype, fake door, Wizard of Oz, or concierge flow.
- Outputs: Prototype strategy + what will be real vs simulated.
- Checks: The setup tests the core value/behavior (not pixel perfection).
- 输入:测试对象;约束条件。
- 行动:选择成本最低且有效的测试方案:真实产品、可点击原型、假门测试、Wizard of Oz测试,或礼宾式流程。
- 输出:原型策略+明确真实与模拟部分。
- 检查项:测试方案能够验证核心价值/行为(而非像素级完美)。
3) Define tasks and success criteria (keep it neutral)
3) 定义任务及成功标准(保持中立)
- Inputs: User goals + scenarios.
- Actions: Write 5–8 realistic tasks (each with a starting state), success criteria, and key observables (hesitation, errors, workarounds).
- Outputs: Task list (draft) + observation plan.
- Checks: Tasks don’t reveal UI labels (“Click the X button”); they reflect real intent.
- 输入:用户目标+场景。
- 行动:编写5-8个贴合真实场景的任务(每个任务包含起始状态)、成功标准,以及关键观察点(犹豫、错误、变通方法)。
- 输出:任务列表(草稿)+ 观察计划。
- 检查项:任务不会透露UI标签(如“点击X按钮”);而是反映真实用户意图。
4) Pick participants + recruiting plan (include buffers)
4) 确定参与者+招募计划(包含备选方案)
- Inputs: Target segment, access to users.
- Actions: Set inclusion/exclusion criteria; choose channels; build a schedule with backups and slack for no-shows and busy participants.
- Outputs: Participant plan + recruiting copy/screener (as needed).
- Checks: Participants match the scenario (behavior/context), not just demographics.
- 输入:目标用户群体、用户获取渠道。
- 行动:设定入选/排除标准;选择招募渠道;制定包含备选人员的日程安排,预留应对用户缺席和忙碌情况的缓冲时间。
- 输出:参与者计划+招募文案/筛选问卷(按需)。
- 检查项:参与者符合测试场景(行为/上下文),而非仅匹配人口统计特征。
5) Build the moderator guide + instrumentation
5) 制作主持人指南+测试工具准备
- Inputs: Task list + prototype.
- Actions: Create the script (intro/consent, warm-up, tasks, probes, wrap-up). Assign note-taker roles; decide what to record.
- Outputs: Moderator guide + notes template + issue log.
- Checks: The guide avoids leading questions and includes “what would you do next?” probes.
- 输入:任务列表+原型。
- 行动:编写测试脚本(介绍/知情同意、热身环节、任务、追问、收尾)。分配记录员角色;确定需要录制的内容。
- 输出:主持人指南+记录模板+问题日志。
- 检查项:指南避免诱导性问题,并包含“接下来你会怎么做?”这类追问。
6) Run sessions and capture evidence (optional “reality checks”)
6) 执行测试会话并收集证据(可选“现实验证”)
- Inputs: Guide, logistics, participants.
- Actions: Run sessions; capture verbatims, errors, rough time-on-task, and moments of confusion. Optionally observe comparable flows “in the wild.”
- Outputs: Completed notes per session + populated issue log.
- Checks: Every issue has at least one concrete example (quote/screenshot/time/step) attached.
- 输入:指南、后勤安排、参与者。
- 行动:执行测试会话;记录用户原话、错误、大致任务耗时,以及用户困惑的时刻。可选择性观察“真实场景”中的类似流程。
- 输出:每个会话的完整记录+已填充的问题日志。
- 检查项:每个问题都至少附带一个具体示例(引用/截图/时间/步骤)。
7) Synthesize into prioritized fixes (micro wins count)
7) 整合结果生成按优先级排序的修复清单(小优化也很重要)
- Inputs: Notes + issue log.
- Actions: Cluster issues; label severity and frequency; connect to funnel/business impact; propose fixes (including microcopy/CTA tweaks).
- Outputs: Synthesis readout + prioritized recommendations/backlog.
- Checks: Each recommendation ties to evidence and an expected impact (directional).
- 输入:记录+问题日志。
- 行动:归类问题;标记严重程度和出现频率;关联转化漏斗/业务影响;提出修复方案(包括微文案/CTA调整)。
- 输出:研究结果整合报告+按优先级排序的建议/待办事项。
- 检查项:每条建议都关联证据,并说明预期影响(方向性)。
8) Share, decide, and run the quality gate
8) 分享结果、做出决策并执行质量校验
- Inputs: Draft pack.
- Actions: Produce a shareable readout, propose next steps (design iteration, follow-up test, experiment). Run references/CHECKLISTS.md and score references/RUBRIC.md.
- Outputs: Final Usability Test Pack + Risks/Open questions/Next steps.
- Checks: A stakeholder can make a “ship / fix / retest” decision asynchronously.
- 输入:测试包草稿。
- 行动:生成可分享的报告,提出下一步行动(设计迭代、后续测试、实验)。执行references/CHECKLISTS.md并根据references/RUBRIC.md评分。
- 输出:最终可用性测试包+风险/待确认问题/下一步行动。
- 检查项:利益相关者能够异步做出“上线/修复/重新测试”的决策。
Quality gate (required)
质量校验(必填)
- Use references/CHECKLISTS.md and references/RUBRIC.md.
- Always include: Risks, Open questions, Next steps.
- 使用references/CHECKLISTS.md和references/RUBRIC.md。
- 必须包含:风险、待确认问题、下一步行动。
Examples
示例
Example 1 (Prototype test): “Create a usability test plan + moderator guide to evaluate our new onboarding flow (web) with 6 first-time users next week.”
Expected: full Usability Test Pack with neutral tasks, recruiting criteria, session logistics, and a synthesis structure.
Expected: full Usability Test Pack with neutral tasks, recruiting criteria, session logistics, and a synthesis structure.
Example 2 (Wizard of Oz): “We want to test an ‘AI auto-triage’ feature before building it. Design a Wizard of Oz usability test plan and script for 5 sessions.”
Expected: stimulus plan defining what’s simulated, tasks focused on value, and an issue log + readout.
Expected: stimulus plan defining what’s simulated, tasks focused on value, and an issue log + readout.
Boundary example: “Run a usability test to prove the redesign will increase retention by 10%.”
Response: explain limits of small-n usability; recommend pairing with instrumentation/experimentation for causality and use usability to diagnose friction.
Response: explain limits of small-n usability; recommend pairing with instrumentation/experimentation for causality and use usability to diagnose friction.
示例1(原型测试):“为我们新的网页版注册流程创建可用性测试计划+主持人指南,下周对6位首次用户进行测试。”
预期输出:完整的可用性测试包,包含中立任务、招募标准、会话后勤安排,以及结果整合框架。
预期输出:完整的可用性测试包,包含中立任务、招募标准、会话后勤安排,以及结果整合框架。
示例2(Wizard of Oz测试):“我们想在开发前测试‘AI自动分诊’功能。设计一个包含5次会话的Wizard of Oz可用性测试计划和脚本。”
预期输出:明确模拟内容的测试素材计划、聚焦价值的任务,以及问题日志+结果报告。
预期输出:明确模拟内容的测试素材计划、聚焦价值的任务,以及问题日志+结果报告。
边界示例:“开展可用性测试以证明重新设计能将留存率提升10%。”
回应:说明小样本可用性测试的局限性;建议结合数据埋点/实验方法来分析因果关系,同时用可用性测试诊断流程中的摩擦点。
回应:说明小样本可用性测试的局限性;建议结合数据埋点/实验方法来分析因果关系,同时用可用性测试诊断流程中的摩擦点。