tiered-test-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Tiered Test Generator

分层测试生成器

Multi-level verification question and test scenario generator.
多难度层级验证问题与测试场景生成工具。

Rules (Absolute)

规则(绝对遵守)

  1. Questions must be answerable. Every question has a definite correct answer or clear evaluation criteria. No subjective trick questions.
  2. Difficulty must be genuine. Tier 3 questions should be genuinely hard, not just verbose versions of Tier 1.
  3. Coverage must be systematic. Questions should cover the full topic, not cluster around one subtopic.
  4. Source traceability. For code-based questions, every question must reference specific files, lines, or behaviors.
  5. No answer leakage. Questions must not contain hints that give away the answer. After generating questions, re-read each one and verify no phrasing, emphasis, or structural pattern reveals the correct answer.
  1. 问题必须可作答。每道题都有明确的正确答案或清晰的评估标准,不得设置主观陷阱题。
  2. 难度必须真实。三级难度问题应当具备真实的高阶难度,而非一级问题的冗长版本。
  3. 覆盖必须系统。问题应覆盖完整主题,不得集中在某一个子主题上。
  4. 来源可追溯。针对代码的问题,每道题都必须引用特定的文件、代码行或程序行为。
  5. 不得泄露答案。问题不得包含暗示答案的提示。生成问题后,重新通读每道题,确认没有措辞、强调或结构模式会泄露正确答案。

Tier System

层级体系

Tier 1: Conceptual (Understanding)

一级:概念类(理解)

  • Difficulty: Foundation
  • Tests: Can you explain what this does and why?
  • Format: Definition, purpose, comparison questions
  • Bloom's Level: Remember, Understand
  • 难度:基础
  • 测试点:你能否解释相关内容的作用和原理?
  • 题型:定义、用途、对比类问题
  • Bloom's Level:记忆、理解

Tier 2: Applied (Usage)

二级:应用类(使用)

  • Difficulty: Intermediate
  • Tests: Can you use this correctly in context?
  • Format: Scenario-based, debugging, "what happens when" questions
  • Bloom's Level: Apply, Analyze
  • 难度:中级
  • 测试点:你能否在实际场景中正确使用相关知识?
  • 题型:场景类、调试类、"当XX时会发生什么"类问题
  • Bloom's Level:应用、分析

Tier 3: Expert (Mastery)

三级:专家类(掌握)

  • Difficulty: Advanced
  • Tests: Can you handle edge cases, design alternatives, and teach it?
  • Format: Edge cases, trade-off analysis, design challenges, "teach this to someone" prompts
  • Bloom's Level: Evaluate, Create
  • 难度:高级
  • 测试点:你能否处理边界情况、设计替代方案、传授相关知识?
  • 题型:边界场景、权衡分析、设计挑战、"向他人讲授该知识点"类提示
  • Bloom's Level:评估、创造

Test Types

测试类型

Type A: Code Comprehension

A类:代码理解

For testing understanding of specific code.
markdown
**[T1] Q1.** What is the primary responsibility of the `UserService` class?
a) Database access
b) Authentication
c) User CRUD operations
d) Session management

**[T2] Q2.** Given this function, what happens when `input` is `null`?
```python
def process(input):
    return input.strip().lower()
a) Returns empty string b) Raises AttributeError c) Returns None d) Silently fails
[T3] Q3. The current error handling in
api/routes.py:45-60
catches all exceptions generically. Design a more robust error handling strategy that:
  • Distinguishes client errors from server errors
  • Provides actionable error messages
  • Doesn't leak internal details
  • Supports error aggregation for monitoring
undefined
用于测试对特定代码的掌握程度。
markdown
**[T1] 问题1.** `UserService`类的主要职责是什么?
a) 数据库访问
b) 身份认证
c) 用户CRUD操作
d) 会话管理

**[T2] 问题2.** 给定以下函数,当`input``null`时会发生什么?
```python
def process(input):
    return input.strip().lower()
a) 返回空字符串 b) 抛出AttributeError c) 返回None d) 静默失败
[T3] 问题3. 当前
api/routes.py:45-60
中的错误处理逻辑会统一捕获所有异常。请设计一个更健壮的错误处理策略,满足以下要求:
  • 区分客户端错误和服务端错误
  • 提供可落地的错误提示
  • 不泄露内部实现细节
  • 支持用于监控的错误聚合
undefined

Type B: Architecture & Design

B类:架构与设计

For testing system-level understanding.
markdown
**[T1] Q1.** What architectural pattern does this codebase follow?

**[T2] Q2.** If read traffic increases 100x, which component becomes the bottleneck first? What's your mitigation strategy?

**[T3] Q3.** The current system uses synchronous inter-service communication. Design a migration path to event-driven architecture that:
- Has zero downtime
- Can be rolled back at any stage
- Preserves data consistency guarantees
用于测试系统层面的理解能力。
markdown
**[T1] 问题1.** 该代码库采用了什么架构模式?

**[T2] 问题2.** 如果读流量增长100倍,哪个组件会最先成为瓶颈?你的缓解策略是什么?

**[T3] 问题3.** 当前系统采用同步服务间通信方式。请设计一套迁移到事件驱动架构的方案,满足以下要求:
- 零停机
- 任意阶段都可回滚
- 保留数据一致性保障

Type C: Process & Methodology

C类:流程与方法论

For testing workflow and best-practice knowledge.
markdown
**[T1] Q1.** What is the purpose of a code review?

**[T2] Q2.** Given this PR with 3 changed files, identify the 2 most important review comments you would make.

**[T3] Q3.** Design a CI/CD pipeline for this project that balances speed with safety. Justify each stage's inclusion and the order.
用于测试工作流和最佳实践知识掌握。
markdown
**[T1] 问题1.** 代码评审的目的是什么?

**[T2] 问题2.** 给定这个修改了3个文件的PR,指出你会提出的2个最重要的评审意见。

**[T3] 问题3.** 为该项目设计一套兼顾速度和安全性的CI/CD流水线。说明每个阶段的存在必要性和排序理由。

Type D: Concept Mastery

D类:概念掌握

For testing domain knowledge.
markdown
**[T1] Q1.** Define "eventual consistency" in your own words.

**[T2] Q2.** Your system uses eventual consistency for user profiles. A user updates their email and immediately tries to log in with the new email. What happens? How do you handle it?

**[T3] Q3.** Compare eventual consistency vs. strong consistency for a financial transaction system. Under what specific conditions would you choose eventual consistency despite the risks?
用于测试领域知识掌握程度。
markdown
**[T1] 问题1.** 用你自己的话定义"最终一致性"。

**[T2] 问题2.** 你的系统对用户 profile 采用最终一致性策略。一个用户更新了邮箱后立即尝试用新邮箱登录,会发生什么?你会怎么处理这个问题?

**[T3] 问题3.** 对比金融交易系统中最终一致性和强一致性的优劣。在哪些特定条件下,你会冒着风险选择最终一致性?

Process

执行流程

Step 1: Analyze the Subject

步骤1:分析主题

  • If code: Read the files, understand the structure
  • If concept: Define the scope and depth
  • If architecture: Map the components
  • 如果主题是代码:阅读相关文件,理解结构
  • 如果主题是概念:明确范围和深度
  • 如果主题是架构:梳理组件关系

Step 2: Generate Question Set

步骤2:生成问题集

Default: 3 questions per tier (9 total). Customizable: user can specify count per tier.
Distribution:
Tier 1 (Conceptual):  3 questions — foundation verification
Tier 2 (Applied):     3 questions — practical understanding
Tier 3 (Expert):      3 questions — mastery and edge cases
默认配置:每个难度层级3道题(共9道)。 支持自定义:用户可以指定每个层级的题目数量。
题目分布:
一级(概念类):3道题 —— 基础掌握验证
二级(应用类):3道题 —— 实践理解验证
三级(专家类):3道题 —— 精通程度与边界场景处理能力验证

Step 3: Create Answer Key

步骤3:创建答案密钥

For each question:
  • Correct answer with explanation
  • Why wrong answers are wrong (for multiple choice)
  • Grading rubric (for open-ended questions)
为每道题准备:
  • 正确答案及解释
  • 错误答案的错因(针对选择题)
  • 评分标准(针对开放式问题)

Step 4: Deliver

步骤4:交付内容

Present questions without answers. Hold answer key until user submits responses.
先展示不带答案的问题,待用户提交作答后再提供答案密钥。

Output Format

输出格式

markdown
undefined
markdown
undefined

Test: [Topic]

测试:[主题]

Instructions

答题说明

  • [N] questions across 3 difficulty tiers
  • Answer all questions, then submit for grading
  • Open-ended questions: aim for 2-3 sentences

  • 共[N]道题,分为3个难度层级
  • 作答所有问题后提交即可获得评分
  • 开放式问题请尽量用2-3句话回答

Tier 1: Conceptual

一级:概念类

Q1. [question] a) [option] b) [option] c) [option] d) [option]
Q2. [question]
Q3. [question]

问题1. [题目内容] a) [选项] b) [选项] c) [选项] d) [选项]
问题2. [题目内容]
问题3. [题目内容]

Tier 2: Applied

二级:应用类

Q4. [scenario + question]
Q5. [debugging scenario]
Q6. [what-happens-when scenario]

问题4. [场景 + 题目内容]
问题5. [调试场景]
问题6. [当XX时会发生什么场景]

Tier 3: Expert

三级:专家类

Q7. [edge case challenge]
Q8. [design challenge]
Q9. [trade-off analysis]

Submit your answers and I'll grade them with detailed feedback.
undefined
问题7. [边界场景挑战]
问题8. [设计挑战]
问题9. [权衡分析]

提交你的答案,我会为你评分并提供详细反馈。
undefined

Grading (Post-Submission)

评分规则(提交后)

When user submits answers:
markdown
undefined
当用户提交答案后:
markdown
undefined

Results: [Topic]

结果:[主题]

Score: [X]/[Total] ([percentage]%)

得分:[X]/[总分] ([百分比]%)

Scoring weights by tier:
  • Tier 1 (Conceptual): 5 pts each (×3 = 15)
  • Tier 2 (Applied): 10 pts each (×3 = 30)
  • Tier 3 (Expert): 15 pts each (×3 = 45)
  • Total: 90 points
各层级评分权重:
  • 一级(概念类):每题5分(×3 = 15分)
  • 二级(应用类):每题10分(×3 = 30分)
  • 三级(专家类):每题15分(×3 = 45分)
  • 总分:90分

Answer Review

答案回顾

Q#TierResultScore
1T1O/X/5
2T1O/X/5
3T1O/X/5
4T2O/X/10
5T2O/X/10
6T2O/X/10
7T3O/X/15
8T3O/X/15
9T3O/X/15
题号层级结果得分
1T1对/错/5
2T1对/错/5
3T1对/错/5
4T2对/错/10
5T2对/错/10
6T2对/错/10
7T3对/错/15
8T3对/错/15
9T3对/错/15

Detailed Feedback

详细反馈

Q[N] — [X] Incorrect

第[N]题 — [X]答错

Your answer: [what they said] Correct answer: [what it should be] Why: [explanation of the correct answer] Key insight: [what understanding gap this reveals]
你的答案: [用户作答内容] 正确答案: [标准答案] 原因: [正确答案的解释] 核心洞见: [暴露的知识掌握缺口]

Diagnostic Summary

诊断总结

DimensionAssessment
Concept Connectivity[How well fundamentals are linked]
Procedural Stability[How reliably they can apply knowledge]
Meta-Cognition[How well they know what they don't know]
维度评估
概念关联性[基础知识的串联掌握程度]
流程应用稳定性[知识应用的可靠程度]
元认知能力[对自身知识盲区的认知程度]

Recommended Next Steps

推荐后续步骤

  • [Specific topics to review based on wrong answers]
undefined
  • [根据答错的题目推荐具体的复习主题]
undefined

When to Use

适用场景

  • After learning something new — verify understanding
  • Before a code review — test your own knowledge of the codebase
  • Interview preparation — generate practice questions
  • Team knowledge assessment — create standardized tests
  • After refactoring — verify nothing was lost in translation
  • Teaching/documentation — create practice exercises
  • 学习新内容后 —— 验证理解程度
  • 代码评审前 —— 测试自己对代码库的掌握程度
  • 面试准备 —— 生成练习题
  • 团队知识评估 —— 创建标准化测试
  • 重构后 —— 验证没有遗漏原有逻辑
  • 教学/文档编写 —— 创建练习题

When NOT to Use

不适用场景

  • For subjective opinion questions (no right answer)
  • When the user just wants information (use
    deep-dive-analyzer
    )
  • For trivial topics that don't warrant testing
  • 主观观点类问题(无正确答案)
  • 用户仅想获取信息(请使用
    deep-dive-analyzer
  • 过于琐碎不值得测试的主题

Integration Notes

集成说明

  • After deep-dive-analyzer: Analyze → Generate tests to verify understanding
  • With skill-composer: Part of the "Deep Learning Pipeline" (analyze → test → fill gaps → iterate)
  • With adversarial-review: Tests verify understanding; adversarial review challenges decisions
  • 在deep-dive-analyzer之后使用:分析 → 生成测试验证理解
  • 与skill-composer搭配使用:属于"深度学习流水线"的一部分(分析 → 测试 → 填补缺口 → 迭代)
  • 与adversarial-review搭配使用:测试验证理解,对抗式评审挑战决策