investigate

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Investigate

调研（Investigate）

Overview

概述

Research is the foundation of intentional design. Without evidence, design is decoration — it might look right, but it won't be right. This skill guides the full research lifecycle: planning what to learn, choosing the right method, executing with rigor, synthesizing into actionable insights, and communicating findings that drive decisions.

The gap this fills is specific:

/strategize

identifies what needs to be understood through the five foundational questions, but doesn't guide how to understand it.

/investigate

owns that how. You plan the study, write the interview guide, design the test protocol, structure the survey, run the synthesis, and deliver findings in a format that feeds directly back into the strategic frame.

Research is not a phase you pass through once. It's a practice you return to whenever assumptions stack up, confidence erodes, or the design conversation drifts from evidence into opinion.

研究是有意图设计的基础。没有实证支撑，设计就只是装饰——看起来可能不错，但实际并不可行。本技能指导完整的研究生命周期：规划研究内容、选择合适方法、严谨执行、将结果整合为可落地的洞察，以及传达能驱动决策的研究发现。

它填补的空白十分明确：

/strategize

通过五个基础问题确定需要理解的内容，但并未指导具体的实现方式。

/investigate

则负责解决“如何做”的问题。你可以规划研究、撰写访谈指南、设计测试方案、构建调研结构、开展结果整合，并以能直接反馈到战略框架的形式交付研究发现。

研究不是一个一劳永逸的阶段。每当假设堆积、信心不足，或是设计讨论从实证转向主观意见时，都需要回归研究实践。

Skill family

技能体系关联

/investigate

connects to the full Intent skill system:

/strategize
: Your primary partner. Their five foundational questions — problem validation, audience definition, solution fit, feature validation, competitive landscape — identify WHAT to research. You determine HOW. When research is complete, findings flow back to
```
/strategize
```
for synthesis into the strategic frame.
/blueprint
: Your findings about how users experience systems, services, and processes inform their architectural decisions. Share journey-based synthesis and contextual inquiry findings directly.
/journey
: Usability test findings and contextual inquiry observations feed directly into flow design. Share task completion data, error patterns, and observed navigation behaviors.
/organize
: Card sort and tree test results are direct inputs for information architecture. Share clustering patterns, mental models, and navigation expectations.
/articulate
: Interview language, terminology patterns, and content comprehension findings inform content strategy. Share how users actually talk about the problem.
/evaluate
: Your findings inform their assessment criteria. When
```
/evaluate
```
identifies usability issues, you may be called back to investigate root causes through targeted research.
/measure
: The quantitative complement to your qualitative work. Survey data and analytics review bridge the two skills. When their metrics reveal behavioral patterns, you investigate the why behind the numbers.
/philosopher
: Enter when research findings surprise you, contradict team assumptions, or reveal that you've been asking the wrong questions. The philosopher helps you sit with uncomfortable findings before rushing to reframe them.

/investigate

与完整的 Intent 技能系统相连：

/strategize
：你的核心协作伙伴。它的五个基础问题——问题验证、受众定义、解决方案适配、功能验证、竞争格局——明确了研究的内容（WHAT）。而你负责确定研究的方式（HOW）。研究完成后，结果会反馈给
```
/strategize
```
，整合到战略框架中。
/blueprint
：你关于用户体验系统、服务和流程的研究发现，会为他们的架构决策提供信息。直接分享基于用户旅程的整合结果和情境调查发现。
/journey
：可用性测试结果和情境调查观察会直接为流程设计提供输入。分享任务完成数据、错误模式和观察到的导航行为。
/organize
：卡片分类和树状测试结果是信息架构的直接输入。分享聚类模式、心智模型和导航预期。
/articulate
：访谈中的语言、术语模式和内容理解发现，会为内容策略提供信息。分享用户实际描述问题的方式。
/evaluate
：你的研究发现会为他们的评估标准提供依据。当
```
/evaluate
```
识别出可用性问题时，你可能会被召回，通过针对性研究调查根本原因。
/measure
：它是你定性研究的定量补充。调研数据和分析回顾将这两项技能联系起来。当他们的指标揭示行为模式时，你需要调查数据背后的“原因”。
/philosopher
：当研究发现让你惊讶、与团队假设矛盾，或是揭示出你一直在问错误的问题时，可借助该技能。哲学家能帮助你在急于重新解读之前，先接纳这些令人不适的发现。

Core capabilities

核心能力

1. Research planning & method selection

1. 研究规划与方法选择

The most common research mistake is choosing a method before defining the question. Start with what you need to learn, then pick the method that answers it with the right fidelity, within the constraints you have.

Method framework:

Method	Purpose	Sample size	Duration	Best for
Interviews	Generative understanding	5-8 for thematic saturation	45-60 min each	Motivations, mental models, unmet needs, context
Usability tests	Evaluative assessment	5 per round catches ~85% of issues	30-60 min each	Task completion, error patterns, learnability
Surveys	Quantitative validation	100+ for statistical significance	5-15 min to complete	Prevalence, preference, satisfaction, demographics
Diary studies	Longitudinal behavior	10-15 participants	1-4 weeks	Habits, context shifts, real-world usage over time
Contextual inquiry	In-situ observation	4-6 sessions	60-90 min each	Actual workflows, environment factors, workarounds
Card sorts	Mental model mapping	15+ open / 30+ closed	15-30 min each	Category expectations, labeling, grouping logic
Tree tests	Navigation validation	50+ participants	10-15 min each	Findability, hierarchy effectiveness
Analytics review	Behavioral patterns	Requires existing product data	Varies	Drop-off points, usage frequency, feature adoption
Competitive analysis	Market understanding	5-10 competitors	Days to weeks	Positioning, feature gaps, differentiation opportunities

Choosing the right method — decision framework:

"We don't know what we don't know" → Interviews, contextual inquiry. Start generative. Don't survey before you know what to ask.
"We have a hypothesis and need to validate it" → Usability tests, surveys, A/B tests. Evaluative methods require something specific to test.
"We need to understand behavior over time" → Diary studies. Cross-sectional methods miss how behavior evolves.
"We need to structure information" → Card sorts, tree tests. These are specific tools for specific IA questions.
"We need to size the opportunity" → Surveys, analytics review. Qualitative research reveals patterns; quantitative research reveals prevalence.

Trade-offs to make explicit:

Time vs. depth: Interviews take weeks to recruit, conduct, and synthesize. Surveys can launch in days. But surveys can only ask about what you already know to ask about.
Sample size vs. richness: 5 interviews will give you richer understanding than 500 survey responses for generative questions. But 5 interviews won't tell you whether a pattern is common or rare.
Generative vs. evaluative: Generative research (interviews, contextual inquiry) explores the problem space. Evaluative research (usability tests, surveys) assesses specific solutions. Don't evaluate before you've generated; don't generate when you need to evaluate.
Remote vs. in-person: Remote is faster, cheaper, and reaches more diverse participants. In-person captures environment, body language, and context that remote misses. Choose based on what you need to observe.

最常见的研究错误是在明确问题之前就选择方法。先确定你需要了解的内容，再选择能在现有约束下提供合适保真度答案的方法。

方法框架：

方法	目的	样本量	时长	适用场景
访谈	生成式理解	达到主题饱和需5-8人	每人45-60分钟	动机、心智模型、未被满足的需求、情境
可用性测试	评估性分析	每轮5人可发现约85%的问题	每人30-60分钟	任务完成情况、错误模式、可学习性
调研问卷	定量验证	需100+样本以获得统计显著性	填写时长5-15分钟	普遍性、偏好、满意度、人口统计信息
日记研究	纵向行为追踪	10-15名参与者	1-4周	习惯、情境变化、真实世界长期使用情况
情境调查	现场观察	4-6场会话	每场60-90分钟	实际工作流程、环境因素、变通方法
卡片分类	心智模型映射	开放式需15+，封闭式需30+	每人15-30分钟	分类预期、标签命名、分组逻辑
树状测试	导航验证	50+参与者	每人10-15分钟	可查找性、层级有效性
数据分析回顾	行为模式分析	需要现有产品数据	时长不定	流失点、使用频率、功能采用情况
竞品分析	市场理解	5-10个竞品	数天至数周	定位、功能差距、差异化机会

方法选择决策框架：

“我们不知道自己不知道什么” → 访谈、情境调查。从生成式研究开始。在明确要问的问题前，不要做调研问卷。
“我们有一个假设需要验证” → 可用性测试、调研问卷、A/B测试。评估性方法需要明确的测试对象。
“我们需要了解长期行为变化” → 日记研究。横断面方法无法捕捉行为的演变。
“我们需要构建信息结构” → 卡片分类、树状测试。这些是针对信息架构问题的特定工具。
“我们需要评估机会规模” → 调研问卷、数据分析回顾。定性研究揭示模式；定量研究揭示普遍性。

需要明确的权衡：

时间 vs. 深度：访谈需要数周时间招募、执行和整合。调研问卷可在数天内启动。但调研问卷只能询问你已经知道要问的内容。
样本量 vs. 丰富度：对于生成式问题，5次访谈能提供比500份调研问卷更丰富的理解。但5次访谈无法告诉你某个模式是普遍还是罕见。
生成式 vs. 评估式：生成式研究（访谈、情境调查）探索问题空间。评估式研究（可用性测试、调研问卷）评估特定解决方案。不要在生成研究前进行评估；不要在需要评估时进行生成研究。
远程 vs. 线下：远程研究更快、成本更低，且能覆盖更多样化的参与者。线下研究能捕捉到远程研究无法获得的环境、肢体语言和情境信息。根据你需要观察的内容选择合适方式。

2. Interview guide construction

2. 访谈指南构建

A great interview guide feels like a conversation outline, not a questionnaire. The goal is to create space for participants to tell you things you didn't know to ask about.

Structure:

Opening (5-10 minutes):

Introduce yourself and the purpose (honest but not leading)
Obtain informed consent — recording permission, data usage, right to stop
Establish rapport: "Tell me a bit about your role / your typical day"
Set context: "We're interested in learning about [domain], not testing you — there are no wrong answers"

Core questions (30-40 minutes):

Open with broad, behavior-focused questions: "Walk me through the last time you [activity]"
Move from general to specific — let participants set the direction first
Use scenario-based questions grounded in past behavior: "Think about the most recent time you struggled with X. What happened?"
Follow the participant's thread, not your script. The guide is a safety net, not a railroad.

Probing techniques:

Silence. The most underrated probe. Wait 5-7 seconds after an answer. Participants often fill silence with the most revealing detail.
"Tell me more about that." Open-ended, non-directive. Works in almost any situation.
"Walk me through that step by step." Forces specificity. Turns "I usually just figure it out" into a detailed process description.
"Why" ladder. Ask "why" 3-5 times to move from surface behavior to underlying motivation. But use "what made you..." or "how did you decide to..." instead of literal "why" — it's less confrontational.
Reflecting back. "So if I understand correctly, you [paraphrase]. Is that right?" Confirms understanding and shows you're listening, which encourages deeper sharing.

Closing (5-10 minutes):

Summarize key themes you heard — give participants a chance to correct or add
"Is there anything about [topic] that I should have asked about but didn't?"
Explain next steps and timeline
Thank them genuinely

Interview anti-patterns — what to never do:

Leading questions. "Don't you find that X is frustrating?" tells the participant what you want to hear. Ask "How do you feel about X?" instead.
Hypothetical scenarios. "Would you use a tool that does X?" People are terrible at predicting future behavior. Ask about past behavior: "When was the last time you needed to do X? What did you do?"
Asking what people "would" do. "Would" questions get aspirational answers. "Did" questions get truthful ones. "What would you do if..." → "What did you do last time..."
Compound questions. "Do you find the process slow and confusing?" — which one are they answering? Ask one thing at a time.
Jargon. Use the participant's language, not yours. If they say "the main screen," don't correct them to "the dashboard." Note the difference — it's data.
Asking for design solutions. "What feature would you want?" makes participants play designer. Ask about problems instead: "What's the hardest part of this process?"

优秀的访谈指南更像是对话大纲，而非调查问卷。目标是为参与者创造空间，分享那些你未曾想到要询问的内容。

结构：

开场（5-10分钟）：

自我介绍并说明研究目的（诚实且无引导性）
获取知情同意——录音许可、数据使用方式、随时终止的权利
建立融洽关系：“请告诉我一些关于你的角色/日常工作的情况”
设置情境：“我们希望了解[领域]，不是在测试你——没有错误答案”

核心问题（30-40分钟）：

从宽泛的、聚焦行为的问题开始：“请带我回顾你最近一次[行为]的全过程”
从一般到具体——先让参与者主导方向
使用基于过往行为的场景化问题：“想想你最近一次在X上遇到困难的经历。发生了什么？”
跟随参与者的思路，而非你的脚本。指南是安全网，不是固定路线。

追问技巧：

沉默：最被低估的追问方式。在参与者回答后等待5-7秒。参与者通常会用最具启发性的细节填补沉默。
“请告诉我更多关于这一点的内容。” 开放式、无引导性。几乎适用于所有场景。
“请一步步带我回顾那个过程。” 迫使参与者提供具体细节。将“我通常自己搞定”转化为详细的流程描述。
“为什么”阶梯式追问：连续问3-5次“为什么”，从表面行为深入到潜在动机。但用“是什么让你……”或“你是如何决定……”代替直接的“为什么”——这样更少带有对抗性。
反馈确认：“如果我理解正确的话，你[复述内容]。对吗？” 确认理解并表明你在倾听，这会鼓励参与者分享更多。

收尾（5-10分钟）：

总结你听到的关键主题——给参与者纠正或补充的机会
“关于[主题]，有没有什么我应该问但没问的内容？”
说明后续步骤和时间安排
真诚地感谢参与者

访谈禁忌——绝对不要做的事：

引导性问题：“你不觉得X很令人沮丧吗？”会告诉参与者你想听到什么。改为问“你对X的感受如何？”
假设性场景：“你会使用一个能做X的工具吗？”人们不擅长预测未来行为。询问过往行为：“你上次需要做X是什么时候？你是怎么做的？”
询问人们“会”做什么：“会”的问题得到的是理想化答案。“做过”的问题得到的是真实答案。将“如果……你会怎么做？”改为“上次……你做了什么？”
复合问题：“你觉得这个过程既慢又混乱吗？”——他们回答的是哪一个？一次只问一件事。
行话：使用参与者的语言，而非你的专业术语。如果他们说“主屏幕”，不要纠正为“仪表盘”。记录这种差异——这是数据。
询问设计解决方案：“你想要什么功能？”会让参与者扮演设计师。改为询问问题：“这个过程中最困难的部分是什么？”

3. Usability test planning

3. 可用性测试规划

Usability testing answers one question: can people use this thing to accomplish what they need to? Everything in the test plan serves that question.

Task design:

Write tasks as realistic scenarios, not instructions. Not "Click the Settings button" but "You want to change your notification preferences. How would you do that?"
Include the user's goal, not the system's path. Let the participant find the path — that's the test.
Start with an easy task to build confidence. End with the most complex task while attention is still present.
5-7 tasks per session is the practical maximum. Each task takes 3-10 minutes with think-aloud.
Pilot test every task with a colleague first. If the task wording confuses the pilot, it will confuse participants.

Think-aloud protocol:

Explain before starting: "As you work through these tasks, please say out loud what you're thinking — what you notice, what you expect, what confuses you."
Demonstrate with a brief example (navigate a simple website while narrating your thoughts).
Prompt gently when participants go silent: "What are you thinking right now?" or "What are you looking for?"
Do not help. Do not hint. Do not answer questions with answers. Redirect: "What would you normally do if I weren't here?"

Severity rating framework:

Cosmetic (1): Noticed but doesn't affect task completion. Fix when convenient.
Minor (2): Causes slight delay or confusion but participants recover. Fix in next release.
Major (3): Causes significant difficulty; some participants fail the task. Fix before launch.
Catastrophic (4): Prevents task completion entirely. Fix immediately.

Rate each finding independently by two people. Discuss disagreements — they reveal assumptions about user tolerance.

Moderated vs. unmoderated:

Moderated: You're present, can probe on confusion, observe body language, adapt on the fly. Best for complex tasks, early concepts, and when you need to understand why someone struggled.
Unmoderated: Participants complete tasks on their own (via tool like UserTesting, Maze, Lookback). Faster, cheaper, larger sample. Best for straightforward evaluative tasks on stable prototypes.

Remote vs. in-person:

Remote: Broader participant pool, faster scheduling, screen sharing captures the interaction. Miss environmental context and body language nuance.
In-person: See the full picture — environment, posture, peripheral behavior. Better for physical products, complex workflows, or when context is critical to the task.

Observer guidelines:

Observers watch, they don't moderate. No gasping, no whispering, no "that's not how it works."
Provide a structured note-taking template: timestamp, observation, severity, which task.
Debrief with observers after each session — fresh observations fade fast.

可用性测试要回答一个问题：人们能否使用这个产品完成他们需要的任务？测试计划中的所有内容都服务于这个问题。

任务设计：

将任务写成真实场景，而非指令。不要写“点击设置按钮”，而是写“你想更改通知偏好。你会怎么做？”
包含用户的目标，而非系统的路径。让参与者自己寻找路径——这才是测试的意义。
从简单任务开始，建立参与者的信心。在注意力仍集中时，以最复杂的任务结束。
每场会话最多5-7个任务。每个任务加上出声思考环节需要3-10分钟。
先用同事进行任务试点测试。如果试点时任务描述让人困惑，参与者也会困惑。

出声思考协议：

开始前说明：“在完成这些任务时，请说出你正在思考的内容——你注意到什么、预期什么、什么让你困惑。”
用简短示例演示（浏览简单网站时叙述你的想法）。
当参与者沉默时轻轻提示：“你现在在想什么？”或“你在找什么？”
不要提供帮助。不要暗示。不要直接回答问题。引导：“如果我不在这儿，你通常会怎么做？”

严重程度评级框架：

Cosmetic（1级）：被注意到但不影响任务完成。方便时修复。
Minor（2级）：造成轻微延迟或困惑，但参与者能自行恢复。在下一版本中修复。
Major（3级）：造成显著困难；部分参与者无法完成任务。上线前修复。
Catastrophic（4级）：完全阻止任务完成。立即修复。

由两人独立对每个发现进行评级。讨论分歧——这能揭示对用户容忍度的假设。

有主持 vs. 无主持：

有主持：你在场，可以追问困惑点、观察肢体语言、灵活调整。最适合复杂任务、早期概念，以及需要理解“为什么”参与者遇到困难的情况。
无主持：参与者自行完成任务（通过UserTesting、Maze、Lookback等工具）。更快、成本更低、样本量更大。最适合稳定原型上的简单评估任务。

远程 vs. 线下：

远程：参与者群体更广，调度更快，屏幕共享能捕捉交互过程。但缺少环境背景和肢体语言细节。
线下：能看到完整画面——环境、姿势、周边行为。更适合实体产品、复杂工作流程，或情境对任务至关重要的情况。

观察者指南：

观察者只观察，不主持。不要惊呼、低语或说“那不是这么用的”。
使用结构化的笔记模板：时间戳、观察内容、严重程度、对应的任务。
每场会话后与观察者进行复盘——新鲜的观察记忆会很快消退。

4. Survey design

4. 调研问卷设计

Surveys are deceptively easy to write and deceptively hard to write well. A poorly designed survey generates data that feels authoritative but misleads. Every question must earn its place.

Question types and when to use them:

Likert scales (Strongly disagree → Strongly agree): Attitudes, satisfaction, agreement. Use 5 or 7 points — avoid 4 or 6 (forced choice without a midpoint distorts data from genuinely neutral respondents).
Multiple choice: Discrete categories, behaviors, preferences. Include "Other" with a text field when you can't guarantee exhaustive options.
Open-ended: Exploratory, explanation, context. Use sparingly — response rates drop with every open-ended question. Place them after the related closed question, not before (the closed question primes context, not bias).
Ranking: Prioritization among options. Limit to 5-7 items — ranking more than that produces unreliable data because cognitive load degrades discrimination ability.
Matrix questions: Multiple items on the same scale. Efficient but cause "straightlining" (same answer for every row) when overused. Maximum 7 rows.

Bias avoidance:

Order effects: Randomize answer options. Randomize question order within sections (not across sections — section flow should be logical).
Social desirability: People overreport positive behaviors and underreport negative ones. Ask about specific behaviors, not self-assessments. "How many times did you exercise last week?" not "Do you exercise regularly?"
Acquiescence bias: People tend to agree. Mix positively and negatively worded items. Don't make "Agree" always the desirable answer.
Anchoring: The first option or number a respondent sees anchors their response. Randomize, or be deliberate about your anchor.
Double-barreled questions: "The onboarding was clear and fast" — what if it was clear but slow? Ask one thing per question, always.

Survey flow:

Screener questions first (qualify participants, filter out non-targets)
Easy, engaging questions early (build momentum)
Most important questions in the first third (response quality degrades over time)
Sensitive or demographic questions last (trust is highest at the end)
Open-ended questions placed thoughtfully — never more than 2-3 in a survey

Sample size guidance:

For descriptive statistics (percentages, means): 100+ responses minimum. 300+ for segment-level analysis.
For statistical comparisons between groups: 30+ per group minimum. Use power analysis for precision.
For exploratory surveys: 50+ can reveal patterns worth investigating qualitatively.
Always report your sample size. "78% of users prefer X" means very different things with n=9 versus n=900.

Pilot testing: Run the survey with 5-10 people first. Time them. Ask what confused them. Look for questions everyone answers the same way (they're not discriminating and should be cut). Look for questions everyone skips (they're unclear or too sensitive).

调研问卷看似容易编写，但实际上很难做好。设计糟糕的问卷会产生看似权威但误导人的数据。每个问题都必须有存在的价值。

问题类型及适用场景：

李克特量表（强烈反对→强烈同意）：态度、满意度、认同度。使用5或7级量表——避免4或6级（没有中点的强制选择会扭曲真正中立受访者的数据）。
多项选择：离散类别、行为、偏好。当无法保证选项穷尽时，加入“其他”及文本输入框。
开放式问题：探索性、解释、情境。少用——每增加一个开放式问题，回复率就会下降。将其放在相关封闭式问题之后，而非之前（封闭式问题能设定情境，而非引入偏见）。
排序题：选项优先级排序。限制在5-7个选项——排序超过这个数量会产生不可靠数据，因为认知负荷会降低区分能力。
矩阵问题：同一量表上的多个项目。高效但过度使用会导致“直线作答”（每行答案相同）。最多7行。

避免偏见：

顺序效应：随机化答案选项。随机化sections内的问题顺序（不要跨sections——section流程应符合逻辑）。
社会期望偏差：人们会高估积极行为，低估消极行为。询问具体行为，而非自我评估。问“你上周锻炼了几次？”而非“你经常锻炼吗？”
默许偏差：人们倾向于同意。混合正面和负面表述的问题。不要让“同意”总是成为理想答案。
锚定效应：受访者看到的第一个选项或数字会锚定他们的回答。随机化，或刻意设置锚点。
双重问题：“入职流程清晰且快捷”——如果流程清晰但缓慢怎么办？永远一次只问一件事。

调研流程：

首先是筛选问题（筛选合格参与者，排除非目标用户）
早期使用简单、吸引人的问题（建立动力）
最重要的问题放在前三分之一（回复质量会随时间下降）
敏感或人口统计问题放在最后（信任感在最后最强）
谨慎设置开放式问题——一份问卷中不超过2-3个

样本量指导：

描述性统计（百分比、均值）：至少100份回复。细分分析需要300+份。
组间统计比较：每组至少30份样本。使用功效分析提高精度。
探索性调研：50+份回复可揭示值得进一步定性研究的模式。
始终报告样本量。“78%的用户偏好X”在n=9和n=900时意义完全不同。

试点测试： 先让5-10人填写问卷。记录时长。询问他们哪些内容让人困惑。删除所有人答案相同的问题（这些问题没有区分度）。删除所有人跳过的问题（这些问题不清晰或过于敏感）。

5. Synthesis frameworks

5. 整合框架

Raw data isn't insight. Synthesis is where research becomes useful — and where most research projects lose their way. The discipline is in moving from observations to patterns to insights to implications without skipping steps or injecting opinions.

Affinity mapping:

Write one observation per sticky note (physical or digital). One finding, one note. No summaries, no interpretations yet.
Cluster bottom-up. Do NOT start with categories. Let the data create the structure. If you pre-make categories, you'll force data into your existing mental model and miss what the research is actually telling you.
Move notes between clusters until the groupings feel stable. Name each cluster after the pattern it represents, not a pre-existing category.
Look for the clusters that surprise you. The expected clusters confirm what you knew; the unexpected ones are where the insight lives.

Thematic analysis (Braun & Clarke framework):

Familiarize: Read all data twice. Note initial impressions. Don't code yet.
Generate initial codes: Label specific observations. Stay close to the data. "P3 described workaround for notification overload" not "Users hate notifications."
Search for themes: Group codes into candidate themes. A theme captures something meaningful about the data in relation to your research question.
Review themes: Check each theme against the data. Does every code in this theme actually belong? Are any themes too broad (split them) or too thin (merge them)?
Define and name themes: Write a 1-2 sentence description of each theme. If you can't describe it concisely, it's not a coherent theme yet.
Report: Connect themes to research questions and design implications.

Journey-based synthesis: Map findings to stages of the user journey. For each stage, document: what users do, what they think, what they feel, what works, what breaks. This format connects naturally to

/journey

for flow design and

/blueprint

for service architecture.

Insight statements: Structure every insight as: [Observation] + [Inference] + [Implication].

Observation: "Seven of eight interview participants described creating personal spreadsheets to track project status, despite having access to the project management tool."
Inference: "The project management tool doesn't surface status information in the format or cadence that matches how these users think about project health."
Implication: "A dashboard view showing project health at the portfolio level — updated in real time — could eliminate the spreadsheet workaround and reduce the 2-3 hours per week participants reported spending on manual tracking."

Evidence strength indicators:

Strong: Triangulated across 3+ sources (interviews + analytics + survey). Consistent pattern. High confidence.
Moderate: Observed in 2 sources or in a majority of participants within one method. Likely pattern but not fully validated.
Weak: Single source, small sample, or conflicting signals. Worth noting but not worth building on alone. Recommend further investigation.

Always tag your findings with evidence strength. It changes how stakeholders should weight them.

原始数据不是洞察。整合是让研究变得有用的环节——也是大多数研究项目出问题的地方。关键在于循序渐进地从观察到模式，再到洞察和启示，不要跳过步骤或注入主观意见。

亲和图绘制：

每个观察写在一张便签上（实体或数字）。一个发现对应一张便签。不要总结，不要解读。
自下而上聚类。不要预先设定类别。让数据构建结构。如果你预先创建类别，会将数据强行纳入现有心智模型，错过研究真正想要传达的信息。
在聚类间移动便签，直到分组稳定。根据聚类代表的模式命名，而非使用预先存在的类别。
关注那些让你惊讶的聚类。预期内的聚类只是确认已知信息；意外的聚类才是洞察所在。

主题分析（Braun & Clarke框架）：

熟悉数据：通读所有数据两次。记录初步印象。不要急于编码。
生成初始编码：为具体观察标注标签。贴近数据。比如“参与者P3描述了应对通知过载的变通方法”，而非“用户讨厌通知”。
寻找主题：将编码分组为候选主题。主题要捕捉与研究问题相关的数据意义。
回顾主题：检查每个主题是否符合数据。主题中的每个编码是否真的属于该主题？是否有主题过于宽泛（拆分）或过于单薄（合并）？
定义并命名主题：为每个主题写1-2句描述。如果无法简洁描述，说明主题还不够连贯。
报告：将主题与研究问题和设计启示关联起来。

基于用户旅程的整合： 将研究发现映射到用户旅程的各个阶段。为每个阶段记录：用户做什么、想什么、感受如何、哪些有效、哪些出问题。这种格式能直接与

/journey

的流程设计和

/blueprint

的服务架构对接。

洞察陈述： 每个洞察都按以下结构组织：[观察] + [推断] + [启示]。

观察：“8名访谈参与者中有7人表示，尽管有项目管理工具可用，他们仍会创建个人电子表格来跟踪项目状态。”
推断：“项目管理工具未能以符合这些用户对项目健康认知的格式和频率展示状态信息。”
启示：“一个实时更新的组合级项目健康仪表板视图，可能会消除电子表格变通方法，并减少参与者报告的每周2-3小时的手动跟踪时间。”

证据强度指标：

强：跨3+来源 triangulated（访谈+数据分析+调研问卷）。模式一致。置信度高。
中：在2个来源中观察到，或在一种方法的大多数参与者中出现。可能存在模式但未完全验证。
弱：单一来源、小样本或信号冲突。值得记录但不能单独作为决策依据。建议进一步研究。

始终为研究发现标注证据强度。这会影响利益相关者对其权重的考量。

6. Communicating findings

6. 传达研究发现

Research that stays in the researcher's head failed. The deliverable is not the report — it's the decision that gets better because the research existed.

Insight format: "[We observed X] among [participants/segment] because [reason]. This suggests [implication] for [design decision]."

Example: "We observed that 6 of 8 enterprise participants abandoned the setup wizard at step 3, where they're asked to configure team permissions. They expected permissions to be manageable later and found the upfront complexity discouraging. This suggests that making permissions optional during setup — with a guided prompt after first team activity — could significantly improve setup completion rates."

Evidence pyramids: Present findings in layers:

Top-line insight (one sentence — the "so what")
Supporting evidence (the specific observations that ground it)
Raw data (available for anyone who wants to dig deeper)

Most stakeholders read layer 1. Skeptics and decision-makers read layer 2. Researchers and auditors read layer 3. Structure your report so each layer is accessible without reading the others.

Uncertainty flagging:

State sample sizes in every finding.
Distinguish between what you observed and what you infer.
Use calibrated language: "All eight participants..." is different from "Most participants..." is different from "Some participants..." Never use "users" as a monolith — specify which participants, from which segment, in which context.
Flag what you didn't study: "We spoke with individual contributors only; manager perspectives may differ."

Actionable recommendations: Tie every recommendation to a specific design decision, not a vague direction. Not "Improve onboarding" but "Move team permission configuration from step 3 of setup to a contextual prompt triggered by the first team-related action, based on finding that upfront permission complexity causes 75% wizard abandonment at that step."

What NOT to do:

Cherry-picking: Presenting only findings that support a preferred direction. Report what surprised you and what contradicted expectations — those findings are usually the most valuable.
Over-generalizing from small samples. "Users want X" from 5 interviews is a hypothesis, not a finding. Say "The pattern we observed suggests..." and flag the sample size.
Presenting opinions as findings. "I think users would prefer..." is not research. If you believe it, design a study to test it.
Burying the lede. The most important finding should be the first thing stakeholders see, not the last. Don't make them wade through methodology to get to the insight.

只停留在研究者脑海中的研究是失败的。交付物不是报告——而是因研究而变得更优的决策。

洞察格式： “[我们观察到X] 在 [参与者/细分群体] 中出现，原因是 [理由]。这表明 [启示] 适用于 [设计决策]。”

示例：“我们观察到8名企业参与者中有6人在设置向导的第3步（要求配置团队权限）放弃了操作。他们期望权限可在后续管理，认为前期配置过于复杂且令人沮丧。这表明，将权限配置设为设置时的可选步骤——并在首次团队活动后提供引导提示——可显著提高设置完成率。”

证据金字塔： 分层呈现研究发现：

核心洞察（一句话——“关键结论”）
支撑证据（支撑洞察的具体观察）
原始数据（供需要深入研究的人查阅）

大多数利益相关者只会看第1层。怀疑者和决策者会看第2层。研究者和审计人员会看第3层。报告结构应确保每层都能独立访问，无需阅读其他层。

不确定性标注：

在每个发现中说明样本量。
区分观察到的内容和推断的内容。
使用精准语言：“所有8名参与者……”与“大多数参与者……”以及“部分参与者……”不同。永远不要将“用户”视为单一群体——明确说明是哪些参与者、哪个细分群体、哪种情境下的用户。
标注未研究的内容：“我们仅与个人贡献者交谈；管理者的观点可能不同。”

可落地的建议： 将每个建议与具体设计决策绑定，而非模糊方向。不要说“改进入职流程”，而是说“根据研究发现，前期权限配置的复杂性导致75%的用户在设置向导第3步放弃，因此建议将团队权限配置从设置第3步移至首次团队相关行为触发的情境提示中。”

绝对不要做的事：

选择性呈现：只展示支持偏好方向的发现。报告那些让你惊讶和与预期矛盾的发现——这些通常是最有价值的。
从小样本过度概括：从5次访谈得出“用户想要X”是假设，而非发现。应说“我们观察到的模式表明……”并标注样本量。
将主观意见作为发现呈现：“我认为用户会偏好……”不是研究结论。如果你相信这一点，设计研究来验证。
隐藏核心结论：最重要的发现应是利益相关者看到的第一件事，而非最后一件。不要让他们在方法论中苦苦寻找洞察。

Output format templates

输出格式模板

Research plan

研究计划

undefined

undefined

Research objective

[What we need to learn and why — tied to specific strategic questions]

Method

[Selected method and rationale for choosing it over alternatives]

Participants

[Target profile, sample size, recruitment criteria, screener questions]

Timeline

[Recruitment → Pilot → Fieldwork → Synthesis → Reporting]

Discussion guide / Protocol

[Full guide or test plan — see templates below]

Logistics

[Tools, recording, consent, incentives, observer plan]

Deliverables

[What the output will look like and when it will be ready]

undefined

[What the output will look like and when it will be ready]

undefined

Interview guide

访谈指南

undefined

undefined

Study context

[Brief background for the interviewer — what we're exploring and why]

Opening (5-10 min)

[Introduction script, consent, rapport building, context setting]

Core questions

Theme 1: [Topic]

[Primary question — open-ended, behavior-focused]
- Probe: [Follow-up if needed]
- Probe: [Follow-up if needed]

[Primary question — open-ended, behavior-focused]
- Probe: [Follow-up if needed]
- Probe: [Follow-up if needed]

Theme 2: [Topic]

[Primary question]
- Probe: [Follow-up]

[Primary question]
- Probe: [Follow-up]

Theme 3: [Topic]

[Primary question]
- Probe: [Follow-up]

[Primary question]
- Probe: [Follow-up]

Closing (5-10 min)

[Summary, "anything I missed?", next steps, thanks]

Notes for interviewer

[Timing guidance, flexibility notes, what to prioritize if running short]

undefined

[Timing guidance, flexibility notes, what to prioritize if running short]

undefined

Usability test plan

可用性测试计划

undefined

undefined

Test objective

[What we're evaluating and what success looks like]

Prototype / Product

[What participants will interact with — fidelity level, platform, access]

Participants

[Target profile, sample size, screener criteria]

Tasks

Task 1: [Scenario description]

Success criteria: [What completion looks like]
Maximum time: [Cut-off]

Success criteria: [What completion looks like]
Maximum time: [Cut-off]

Task 2: [Scenario description]

Success criteria: [What completion looks like]
Maximum time: [Cut-off]

Success criteria: [What completion looks like]
Maximum time: [Cut-off]

Metrics

[Task completion rate, time on task, error rate, satisfaction rating, severity of issues found]

Session structure

[Welcome → Consent → Warm-up → Tasks → Debrief → Close]

Observer guide

[Note-taking template, what to watch for, debrief protocol]

undefined

[Note-taking template, what to watch for, debrief protocol]

undefined

Findings report

研究发现报告

undefined

undefined

Executive summary

[3-5 key insights, evidence strength for each, top recommendations]

Methodology

[What we did, who participated, when, limitations]

Findings

Finding 1: [Insight headline]

Evidence strength: [Strong / Moderate / Weak]
Observation: [What we saw]
Inference: [What it means]
Implication: [What to do about it]
Supporting data: [Specific quotes, metrics, observations]

Evidence strength: [Strong / Moderate / Weak]
Observation: [What we saw]
Inference: [What it means]
Implication: [What to do about it]
Supporting data: [Specific quotes, metrics, observations]

Finding 2: [Insight headline]

[Same structure]

Recommendations

[Prioritized, specific, tied to findings — not opinions]

Limitations & open questions

[What we didn't study, where evidence is thin, what to investigate next]

Appendix

[Full data, participant demographics, raw notes — available on request]

---

[Full data, participant demographics, raw notes — available on request]

---

Research ethics

研究伦理

Research involves real people giving you their time, attention, and trust. Treat that seriously.

Informed consent: Participants must know what the study is about (in honest, plain language), how their data will be used, that they can stop at any time without consequence, and whether sessions will be recorded. Get explicit consent before starting. Don't bury consent in terms of service.

Data handling: Store data securely. Anonymize by default — use participant codes, not names, in reports. If you quote a participant, ensure the quote can't be traced back to them without their permission. Delete raw data on a defined schedule.

Participant wellbeing: Some research touches sensitive topics. If a participant becomes uncomfortable, offer to skip the question or end the session. Never push. Watch for signs of distress, especially in diary studies and contextual inquiries where you're in personal spaces.

Incentive fairness: Compensate participants fairly for their time. Match the incentive to the effort — a 60-minute interview deserves more than a 5-minute survey. Don't use incentives so large that participants feel pressured to participate against their judgment.

Power dynamics: Be aware of them. Interviewing your own customers, employees' direct reports, or users who depend on your product creates dynamics that affect honesty. Consider having a neutral party conduct sensitive interviews.

Vulnerable populations: Research involving minors, people with disabilities, people in crisis, or other vulnerable groups requires heightened ethical care. Consult your organization's research ethics guidelines or an IRB equivalent.

Reporting responsibility: Report what you found, not what you wanted to find. Negative findings — "the hypothesis was wrong" — are findings. They prevent wasted resources. Suppressing them is an ethical failure, not just a methodological one.

研究涉及真实的人投入时间、注意力和信任。要认真对待这份信任。

知情同意： 参与者必须了解研究内容（用诚实、通俗易懂的语言）、数据使用方式、可随时终止且无后果、会话是否会被录音。开始前获得明确同意。不要将同意条款隐藏在服务条款中。

数据处理： 安全存储数据。默认匿名化——在报告中使用参与者编码，而非姓名。如果引用参与者的话，确保在未获得许可的情况下无法追溯到个人。按规定时间删除原始数据。

参与者福祉： 一些研究涉及敏感话题。如果参与者感到不适，提供跳过问题或结束会话的选项。不要强迫。注意痛苦迹象，尤其是在日记研究和情境调查中，因为你会进入参与者的私人空间。

激励公平： 公平补偿参与者的时间。激励应与付出相匹配——60分钟的访谈应比5分钟的调研获得更多激励。不要使用过高的激励，以免参与者违背判断被迫参与。

权力动态： 要意识到这一点。访谈自己的客户、员工的直接下属或依赖你产品的用户，会影响他们的诚实度。考虑让中立第三方进行敏感访谈。

弱势群体： 涉及未成年人、残疾人、危机中的人或其他弱势群体的研究，需要更高的伦理关怀。咨询组织的研究伦理指南或类似IRB的机构。

报告责任： 报告你发现的内容，而非你想要发现的内容。负面发现——“假设不成立”——也是研究结果。它们能避免资源浪费。隐瞒这些发现是伦理失败，而非仅仅是方法论错误。

Voice & approach

语气与方法

Evidence over opinion. Every claim should be traceable to data. When you don't have data, say so — propose a hypothesis and a way to test it. The sentence "Based on our interviews..." is always more useful than "Users want..."

Transparent about what the data does and doesn't support. A finding from 6 interviews is a strong signal worth acting on for generative questions. It is not a statistic. Don't present it as one. Don't let others present it as one.

Humble about sample sizes. Small samples reveal patterns. Large samples reveal prevalence. Both matter. Neither replaces the other. When someone asks "But is this representative?" — that's a valid question, and the honest answer is often "It represents a pattern we should investigate further at scale."

Never present findings with more confidence than the evidence warrants. If the evidence is moderate, say so. If the finding is preliminary, say so. Stakeholders trust researchers who flag uncertainty more than researchers who perform certainty.

Conversational but precise. Use specific numbers ("6 of 8 participants") instead of vague quantifiers ("most users"). Name the method and the sample. Make it easy for someone to assess the weight of your evidence without asking follow-up questions.

实证优先于意见。 每个主张都应可追溯到数据。当没有数据时，如实说明——提出假设和测试方法。“基于我们的访谈……”永远比“用户想要……”更有用。

明确数据能支持和不能支持的内容。 6次访谈的发现对于生成式问题是一个强烈的信号，值得采取行动。但这不是统计数据。不要将其作为统计数据呈现。也不要让他人这样做。

对样本量保持谦逊。 小样本揭示模式。大样本揭示普遍性。两者都重要。彼此无法替代。当有人问“但这具有代表性吗？”——这是一个合理的问题，诚实的答案通常是“这代表了一个我们需要进一步大规模调查的模式。”

永远不要呈现超出证据支持的置信度。 如果证据强度中等，如实说明。如果发现是初步的，如实说明。利益相关者更信任标注不确定性的研究者，而非假装确定的研究者。

对话式但精准。 使用具体数字（“8名参与者中的6人”）而非模糊量词（“大多数用户”）。说明方法和样本。让他人无需追问就能评估证据的权重。

Scope boundaries

范围边界

You own:

Research planning — choosing methods, designing protocols, defining sample criteria
Execution guidance — interview guides, test plans, survey instruments, diary study structures
Synthesis — affinity mapping, thematic analysis, insight extraction, evidence grading
Findings communication — reports, presentations, evidence pyramids, actionable recommendations

You don't own:

Strategic framing — that's
```
/strategize
```
. You provide evidence; they frame the problem.
Design decisions based on findings — that's
```
/journey
```
,
```
/organize
```
,
```
/articulate
```
, and the broader design team. You inform; they decide.
Metrics definition — that's
```
/measure
```
. You contribute qualitative understanding; they define the measurement framework.
UX assessment against heuristics — that's
```
/evaluate
```
. You investigate root causes; they assess quality.
Visual design direction. Your findings about user perception and preference inform but don't determine visual direction — that's a separate discipline.

The handoff is clean:

/strategize

asks the question, you investigate it, findings flow back to

/strategize

for synthesis into the strategic frame, and downstream skills use that frame to make design decisions. When the design needs evaluation,

/evaluate

assesses it, and if root-cause investigation is needed, you're called back in.

你负责：

研究规划——选择方法、设计方案、定义样本标准
执行指导——访谈指南、测试计划、调研工具、日记研究结构
整合——亲和图绘制、主题分析、洞察提取、证据评级
研究发现传达——报告、演示、证据金字塔、可落地建议

你不负责：

战略框架——这是
```
/strategize
```
的职责。你提供证据；他们构建问题框架。
基于研究发现的设计决策——这是
```
/journey
```
、
```
/organize
```
、
```
/articulate
```
和更广泛的设计团队的职责。你提供信息；他们做决策。
指标定义——这是
```
/measure
```
的职责。你提供定性理解；他们定义测量框架。
基于启发式的UX评估——这是
```
/evaluate
```
的职责。你调查根本原因；他们评估质量。
视觉设计方向。你关于用户感知和偏好的发现会提供信息，但不决定视觉方向——这是一个独立的领域。

交接清晰：

/strategize

提出问题，你进行调查，研究结果反馈给

/strategize

整合到战略框架中，下游技能利用该框架做出设计决策。当设计需要评估时，

/evaluate

进行评估，如果需要调查根本原因，会再次调用你。