technical-interviewing

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When this skill is activated, always start your first response with the 🧢 emoji.

当激活此技能时，你的第一条回复请始终以🧢表情开头。

Technical Interviewing

技术面试

Technical interviewing is both a skill and a system. The goal is not to find the "smartest" candidate - it is to predict on-the-job performance with high signal and low noise while treating every candidate with respect. A well-designed interview loop uses structured questions, clear rubrics, and calibrated interviewers to make consistent, defensible hiring decisions. This skill covers the full lifecycle: designing coding challenges, structuring system design rounds, building rubrics, calibrating panels, and reducing bias.

技术面试既是一项技能，也是一个体系。其目标并非找到“最聪明”的候选人——而是在尊重每位候选人的前提下，以高信号、低噪音的方式预测其在职表现。设计完善的面试流程会使用结构化问题、清晰的评分表和经过校准的面试官，来做出一致且有依据的招聘决策。此技能覆盖全生命周期：设计编码挑战、构建系统设计轮次、制定评分表、校准面试小组以及减少偏见。

When to use this skill

何时使用此技能

Trigger this skill when the user:

Wants to design a coding challenge or take-home assignment for a specific role
Needs to create a system design interview question with follow-ups
Asks to build a scoring rubric or evaluation criteria for interviews
Wants to structure a full interview loop (phone screen through onsite)
Needs to calibrate interviewers or run a calibration session
Asks about reducing bias in technical assessments
Wants to evaluate a candidate's performance against a rubric
Needs interviewer training materials or shadow guides

Do NOT trigger this skill for:

Preparing as a candidate for interviews (use system-design or algorithm skills)
General HR hiring workflows not specific to technical assessment

当用户有以下需求时，触发此技能：

想要为特定岗位设计编码挑战或带回家完成的作业
需要创建带有后续问题的系统设计面试题
要求制定面试评分表或评估标准
想要构建完整的面试流程（从电话面到现场面）
需要校准面试官或开展校准会议
询问如何减少技术评估中的偏见
想要根据评分表评估候选人表现
需要面试官培训材料或观摩指南

请勿在以下场景触发此技能：

候选人准备面试（请使用system-design或algorithm技能）
非技术评估相关的通用HR招聘流程

Key principles

核心原则

Structure over gut feel - Every question must have a rubric before it is used. "I'll know a good answer when I see it" is not a rubric. Define what strong, acceptable, and weak look like in advance. Structured interviews are 2x more predictive than unstructured ones.
Signal-to-noise ratio - Each question should test exactly one or two competencies. If a coding question tests algorithms, data structures, API design, and communication simultaneously, you cannot isolate what the candidate is actually good or bad at. Separate the signals.
Calibrate constantly - The same "strong" performance should get the same score regardless of which interviewer runs the session. Run calibration exercises quarterly using recorded or written mock answers.
Respect the candidate's time - Take-homes should take 2-4 hours max (state this explicitly). Onsite loops should not exceed 4-5 hours. Every minute of the candidate's time should produce meaningful signal.
Reduce bias systematically - Use identical questions per role, score before discussing with other interviewers, avoid anchoring on resume prestige, and ensure your rubric tests skills not proxies (e.g. "uses our preferred framework" is a proxy, not a skill).

结构化而非凭直觉 - 每个问题在使用前必须配有评分表。“我看到好答案时自然能识别”不能作为评分表。提前定义优秀、合格和不合格的具体表现。结构化面试的预测准确性是非结构化面试的2倍。
信号噪音比 - 每个问题应仅测试1-2项能力。如果一道编程题同时测试算法、数据结构、API设计和沟通能力，你将无法明确候选人真正擅长或欠缺的能力。要分离不同信号。
持续校准 - 无论哪位面试官执行面试，相同的“优秀”表现都应得到相同的分数。每季度使用录制的或书面的模拟答案开展校准练习。
尊重候选人时间 - 带回家完成的作业最长不应超过2-4小时（需明确说明）。现场面试总时长不应超过4-5小时。候选人的每一分钟时间都应产生有意义的信号。
系统性减少偏见 - 同一岗位使用相同的问题，在与其他面试官讨论前先打分，避免受简历背景影响，确保评分表测试的是技能而非替代指标（例如“使用我们偏好的框架”是替代指标，而非技能）。

Core concepts

核心概念

The interview funnel

面试漏斗

Every technical hiring loop follows a narrowing funnel. Each stage should have a clear purpose and avoid re-testing what was already assessed:

Stage	Purpose	Duration	Signal
Resume screen	Baseline qualifications	2-5 min	Experience match
Phone screen	Communication + baseline coding	30-45 min	Can they code at all?
Technical deep-dive	Core competency for the role	45-60 min	Domain strength
System design	Architecture thinking (senior+)	45-60 min	Scope, trade-offs
Culture/values	Team fit, collaboration style	30-45 min	Working style

每个技术招聘流程都遵循一个逐步收窄的漏斗。每个阶段都应有明确的目标，避免重复测试已评估过的内容：

阶段	目标	时长	信号
简历筛选	基础资质审核	2-5分钟	经验匹配度
电话面	沟通能力+基础编程能力	30-45分钟	候选人是否具备基本编程能力？
技术深度面试	岗位核心能力	45-60分钟	领域专业能力
系统设计	架构思维（资深岗位适用）	45-60分钟	规划能力、权衡决策
文化/价值观	团队适配度、协作风格	30-45分钟	工作风格

Question types

问题类型

Algorithmic - Data structures, complexity analysis. Best for junior/mid roles. Risk: over-indexes on contest skills vs real work.
Practical coding - Build a small feature, debug existing code, extend an API. Better signal for day-to-day work.
System design - Design a URL shortener, notification system, rate limiter. Best for senior+ roles. Tests breadth and trade-off reasoning.
Code review - Review a PR with intentional issues. Tests reading skill and communication.
Take-home - Larger project done asynchronously. Best signal but highest candidate time cost.

算法类 - 数据结构、复杂度分析。最适合初级/中级岗位。风险：过度侧重竞赛技能而非实际工作能力。
实用编程类 - 构建小型功能、调试现有代码、扩展API。更能反映日常工作的信号。
系统设计类 - 设计URL短链接服务、通知系统、限流系统。最适合资深岗位。测试知识面广度和权衡决策能力。
代码评审类 - 评审存在故意设置问题的PR。测试阅读代码能力和沟通能力。
带回家作业类 - 异步完成的较大型项目。信号质量最佳，但候选人耗时最多。

Rubric anatomy

评分表结构

Every rubric has four components:

Competency - What you are testing (e.g. "API design")
Levels - Typically 4: Strong Hire, Hire, No Hire, Strong No Hire
Behavioral anchors - Concrete examples of what each level looks like
Must-haves vs nice-to-haves - Which criteria are required vs bonus

每个评分表包含四个部分：

能力项 - 测试的内容（例如“API设计”）
等级 - 通常分为4级：强烈聘用、聘用、不聘用、强烈不聘用
行为锚点 - 每个等级对应的具体表现示例
必备项 vs 加分项 - 哪些是要求必须满足的，哪些是额外加分的

Common tasks

常见任务

Design a coding challenge

设计编码挑战

Start with the role requirements, not a clever problem. Work backward:

Identify 1-2 core competencies the role needs daily
Design a problem that requires those competencies to solve
Create 3 difficulty tiers: base case, standard, extension
Write the rubric before finalizing the problem
Test-solve it yourself and time it (multiply by 1.5-2x for candidates)

Template:

PROBLEM: <Title>
LEVEL: Junior / Mid / Senior
TIME: <X> minutes
COMPETENCIES TESTED: <1-2 specific skills>

PROMPT:
  <Clear problem statement with examples>

BASE CASE (must complete):
  <Minimum viable solution criteria>

STANDARD (expected for hire):
  <Additional requirements showing solid understanding>

EXTENSION (differentiates strong hire):
  <Follow-up that tests depth or edge case thinking>

RUBRIC:
  Strong Hire: Completes standard + extension, clean code, discusses trade-offs
  Hire: Completes standard, reasonable code quality, handles prompts on edge cases
  No Hire: Completes base only, significant code quality issues
  Strong No Hire: Cannot complete base case, fundamental misunderstandings

从岗位需求出发，而非从巧妙的问题出发。反向推导：

确定岗位日常所需的1-2项核心能力
设计一个需要这些能力才能解决的问题
设置3个难度层级：基础版、标准版、拓展版
在最终确定问题前先编写评分表
自行测试解题并计时（将时间乘以1.5-2倍作为候选人的预计耗时）

模板：

PROBLEM: <标题>
LEVEL: Junior / Mid / Senior
TIME: <X> minutes
COMPETENCIES TESTED: <1-2项具体技能>

PROMPT:
  <清晰的问题描述及示例>

BASE CASE (必须完成):
  <最小可行解决方案要求>

STANDARD (聘用合格线):
  <体现扎实理解的额外要求>

EXTENSION (区分强烈聘用):
  <测试深度或边缘情况思维的后续问题>

RUBRIC:
  Strong Hire: 完成标准版+拓展版，代码整洁，能讨论权衡决策
  Hire: 完成标准版，代码质量合理，能处理边缘情况提示
  No Hire: 仅完成基础版，存在严重代码质量问题
  Strong No Hire: 无法完成基础版，存在根本性理解错误

Create a system design question

创建系统设计面试题

Good system design questions are open-ended with clear scaling dimensions:

Pick a system the candidate likely understands as a user
Define initial constraints (users, QPS, data volume)
Prepare 4-6 follow-up dimensions to probe depth
Write what "good" looks like at each stage

Follow-up dimensions to prepare:

Scale: "Now handle 10x the traffic"
Reliability: "A database node goes down - what happens?"
Consistency: "Two users edit the same document simultaneously"
Cost: "The CEO says infrastructure costs are too high"
Latency: "P99 latency must be under 200ms"
Security: "How do you handle authentication and authorization?"

优秀的系统设计题具有开放性，且有明确的扩展维度：

选择一个候选人作为用户可能熟悉的系统
定义初始约束条件（用户量、QPS、数据量）
准备4-6个后续维度以挖掘深度
编写每个阶段“优秀”表现的标准

准备的后续维度：

扩展：“现在需要处理10倍的流量”
可靠性：“一个数据库节点宕机了——会发生什么？”
一致性：“两个用户同时编辑同一文档”
成本：“CEO说基础设施成本太高”
延迟：“P99延迟必须低于200ms”
安全性：“你如何处理认证与授权？”

Build a scoring rubric

制定评分表

For each competency being assessed:

COMPETENCY: <Name>
WEIGHT: <High / Medium / Low>

STRONG HIRE (4):
  - <Specific observable behavior>
  - <Specific observable behavior>

HIRE (3):
  - <Specific observable behavior>
  - <Specific observable behavior>

NO HIRE (2):
  - <Specific observable behavior>

STRONG NO HIRE (1):
  - <Specific observable behavior>

Always use behavioral anchors (what you observed), not trait labels ("smart", "passionate"). "Identified the race condition without prompting and proposed a lock-based solution" is a behavioral anchor. "Seemed smart" is not.

针对每个评估的能力项：

COMPETENCY: <名称>
WEIGHT: <High / Medium / Low>

STRONG HIRE (4):
  - <具体可观察行为>
  - <具体可观察行为>

HIRE (3):
  - <具体可观察行为>
  - <具体可观察行为>

NO HIRE (2):
  - <具体可观察行为>

STRONG NO HIRE (1):
  - <具体可观察行为>

始终使用行为锚点（你观察到的内容），而非特质标签（“聪明”、“有热情”）。“无需提示即识别出竞争条件并提出基于锁的解决方案”是行为锚点，“看起来聪明”则不是。

Structure a full interview loop

构建完整面试流程

Map each stage to a unique competency. Never duplicate signals:

ROLE: <Title, Level>
TOTAL STAGES: <N>

Stage 1 - Phone Screen (45 min)
  Interviewer type: Any engineer
  Format: Practical coding
  Tests: Baseline coding ability, communication
  Question: <Specific question or question bank ID>

Stage 2 - Technical Deep-Dive (60 min)
  Interviewer type: Domain expert
  Format: Domain-specific coding
  Tests: <Role-specific competency>
  Question: <Specific question>

Stage 3 - System Design (60 min)  [Senior+ only]
  Interviewer type: Senior+ engineer
  Format: Whiteboard / virtual whiteboard
  Tests: Architecture thinking, trade-off reasoning
  Question: <Specific question>

Stage 4 - Culture & Collaboration (45 min)
  Interviewer type: Cross-functional partner
  Format: Behavioral + scenario-based
  Tests: Communication, conflict resolution, ownership

将每个阶段映射到独特的能力项，切勿重复测试相同信号：

ROLE: <岗位名称、级别>
TOTAL STAGES: <N>

Stage 1 - Phone Screen (45 min)
  Interviewer type: Any engineer
  Format: Practical coding
  Tests: Baseline coding ability, communication
  Question: <具体问题或问题库ID>

Stage 2 - Technical Deep-Dive (60 min)
  Interviewer type: Domain expert
  Format: Domain-specific coding
  Tests: <岗位特定能力>
  Question: <具体问题>

Stage 3 - System Design (60 min)  [Senior+ only]
  Interviewer type: Senior+ engineer
  Format: Whiteboard / virtual whiteboard
  Tests: Architecture thinking, trade-off reasoning
  Question: <具体问题>

Stage 4 - Culture & Collaboration (45 min)
  Interviewer type: Cross-functional partner
  Format: Behavioral + scenario-based
  Tests: Communication, conflict resolution, ownership

Run a calibration session

开展校准会议

Calibration aligns interviewers on what each rubric level means:

Select 3-4 real or mock candidate responses (anonymized)
Have each interviewer score independently using the rubric
Reveal scores simultaneously (avoid anchoring)
Discuss disagreements - focus on which rubric criteria were interpreted differently
Update rubric language where ambiguity caused divergence
Document decisions as "calibration notes" appended to the rubric

Target: interviewers should agree within 1 point on a 4-point scale at least 80% of the time.

校准会议用于统一面试官对评分表各等级的理解：

选择3-4个真实或模拟的候选人回答（匿名处理）
让每位面试官独立使用评分表打分
同时公布分数（避免锚定效应）
讨论分歧点——聚焦于评分表中哪些标准被不同解读
更新评分表中导致歧义的表述
将决策记录为“校准说明”附加到评分表中

目标：面试官在4分制评分中，至少80%的情况下评分差异不超过1分。

Design a take-home assignment

设计带回家作业

Take-homes must balance signal quality with respect for candidate time:

State the expected time explicitly (2-4 hours)
Provide a starter repo with boilerplate already set up
Define submission format and evaluation criteria upfront
Include a README template for candidates to explain their approach
Grade with a rubric, not vibes
Offer a live follow-up to discuss the submission (15-30 min)

带回家作业必须在信号质量和尊重候选人时间之间取得平衡：

明确说明预计耗时（2-4小时）
提供包含基础代码的初始仓库
提前定义提交格式和评估标准
为候选人提供README模板以解释其实现思路
使用评分表打分，而非凭感觉
提供15-30分钟的线上跟进环节以讨论提交的作业

Anti-patterns / common mistakes

反模式/常见错误

Mistake	Why it's wrong	What to do instead
No rubric before interviews	Every interviewer uses different criteria; inconsistent decisions	Write and distribute rubric before any candidate is interviewed
Asking trivia questions	Tests memorization, not ability; alienates strong candidates	Ask problems that require reasoning, not recall
"Culture fit" as veto	Becomes a proxy for demographic similarity	Define specific values and behaviors you are testing for
Same question for all levels	Junior and senior roles need different signal	Adjust complexity and expected depth per level
Discussing candidates before scoring	First opinion anchors everyone else	Score independently, then debrief
Marathon interviews (6+ hours)	Candidate fatigue degrades signal; disrespects their time	Cap at 4-5 hours including breaks
Only testing algorithms	Most roles never use graph traversal; poor signal for day-to-day work	Match question type to actual job tasks
No interviewer training	Untrained interviewers ask leading questions, give inconsistent hints	Run shadow sessions and calibration quarterly

错误	错误原因	正确做法
面试前无评分表	每位面试官使用不同标准，决策不一致	在面试任何候选人前编写并分发评分表
询问 trivia 问题	测试记忆力而非能力，会排斥优秀候选人	提出需要推理而非回忆的问题
以“文化适配”为由否决	会成为人口特征相似性的替代指标	明确定义你要测试的具体价值观和行为
所有级别使用相同问题	初级和资深岗位需要不同的信号	根据级别调整问题复杂度和预期深度
打分前讨论候选人	第一印象会影响所有人的判断	先独立打分，再开展讨论
马拉松式面试（6小时以上）	候选人疲劳会降低信号质量，不尊重其时间	总时长控制在4-5小时（含休息时间）
仅测试算法	大多数岗位从未使用图遍历；对日常工作的信号价值低	问题类型要与实际工作任务匹配
未对面试官进行培训	未受训的面试官会提出诱导性问题，给出不一致的提示	每季度开展观摩会议和校准会议

References

参考资料

For detailed guidance on specific topics, read the relevant file from the

references/

folder:

```
references/system-design-questions.md
```
- Library of system design questions organized by level with expected discussion points and rubric anchors
```
references/coding-challenge-patterns.md
```
- Coding challenge templates organized by competency signal (API design, data modeling, debugging, concurrency)
```
references/rubric-calibration.md
```
- Step-by-step calibration session guide with sample scoring exercises and facilitator script

如需特定主题的详细指导，请阅读

references/

文件夹中的相关文件：

```
references/system-design-questions.md
```
- 按级别分类的系统设计问题库，包含预期讨论点和评分表锚点
```
references/coding-challenge-patterns.md
```
- 按能力信号分类的编码挑战模板（API设计、数据建模、调试、并发）
```
references/rubric-calibration.md
```
- 分步校准会议指南，包含示例打分练习和主持人脚本