tiered-test-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTiered Test Generator
分层测试生成器
Multi-level verification question and test scenario generator.
多难度层级验证问题与测试场景生成工具。
Rules (Absolute)
规则(绝对遵守)
- Questions must be answerable. Every question has a definite correct answer or clear evaluation criteria. No subjective trick questions.
- Difficulty must be genuine. Tier 3 questions should be genuinely hard, not just verbose versions of Tier 1.
- Coverage must be systematic. Questions should cover the full topic, not cluster around one subtopic.
- Source traceability. For code-based questions, every question must reference specific files, lines, or behaviors.
- No answer leakage. Questions must not contain hints that give away the answer. After generating questions, re-read each one and verify no phrasing, emphasis, or structural pattern reveals the correct answer.
- 问题必须可作答。每道题都有明确的正确答案或清晰的评估标准,不得设置主观陷阱题。
- 难度必须真实。三级难度问题应当具备真实的高阶难度,而非一级问题的冗长版本。
- 覆盖必须系统。问题应覆盖完整主题,不得集中在某一个子主题上。
- 来源可追溯。针对代码的问题,每道题都必须引用特定的文件、代码行或程序行为。
- 不得泄露答案。问题不得包含暗示答案的提示。生成问题后,重新通读每道题,确认没有措辞、强调或结构模式会泄露正确答案。
Tier System
层级体系
Tier 1: Conceptual (Understanding)
一级:概念类(理解)
- Difficulty: Foundation
- Tests: Can you explain what this does and why?
- Format: Definition, purpose, comparison questions
- Bloom's Level: Remember, Understand
- 难度:基础
- 测试点:你能否解释相关内容的作用和原理?
- 题型:定义、用途、对比类问题
- Bloom's Level:记忆、理解
Tier 2: Applied (Usage)
二级:应用类(使用)
- Difficulty: Intermediate
- Tests: Can you use this correctly in context?
- Format: Scenario-based, debugging, "what happens when" questions
- Bloom's Level: Apply, Analyze
- 难度:中级
- 测试点:你能否在实际场景中正确使用相关知识?
- 题型:场景类、调试类、"当XX时会发生什么"类问题
- Bloom's Level:应用、分析
Tier 3: Expert (Mastery)
三级:专家类(掌握)
- Difficulty: Advanced
- Tests: Can you handle edge cases, design alternatives, and teach it?
- Format: Edge cases, trade-off analysis, design challenges, "teach this to someone" prompts
- Bloom's Level: Evaluate, Create
- 难度:高级
- 测试点:你能否处理边界情况、设计替代方案、传授相关知识?
- 题型:边界场景、权衡分析、设计挑战、"向他人讲授该知识点"类提示
- Bloom's Level:评估、创造
Test Types
测试类型
Type A: Code Comprehension
A类:代码理解
For testing understanding of specific code.
markdown
**[T1] Q1.** What is the primary responsibility of the `UserService` class?
a) Database access
b) Authentication
c) User CRUD operations
d) Session management
**[T2] Q2.** Given this function, what happens when `input` is `null`?
```python
def process(input):
return input.strip().lower()a) Returns empty string
b) Raises AttributeError
c) Returns None
d) Silently fails
[T3] Q3. The current error handling in catches all exceptions generically. Design a more robust error handling strategy that:
api/routes.py:45-60- Distinguishes client errors from server errors
- Provides actionable error messages
- Doesn't leak internal details
- Supports error aggregation for monitoring
undefined用于测试对特定代码的掌握程度。
markdown
**[T1] 问题1.** `UserService`类的主要职责是什么?
a) 数据库访问
b) 身份认证
c) 用户CRUD操作
d) 会话管理
**[T2] 问题2.** 给定以下函数,当`input`为`null`时会发生什么?
```python
def process(input):
return input.strip().lower()a) 返回空字符串
b) 抛出AttributeError
c) 返回None
d) 静默失败
[T3] 问题3. 当前中的错误处理逻辑会统一捕获所有异常。请设计一个更健壮的错误处理策略,满足以下要求:
api/routes.py:45-60- 区分客户端错误和服务端错误
- 提供可落地的错误提示
- 不泄露内部实现细节
- 支持用于监控的错误聚合
undefinedType B: Architecture & Design
B类:架构与设计
For testing system-level understanding.
markdown
**[T1] Q1.** What architectural pattern does this codebase follow?
**[T2] Q2.** If read traffic increases 100x, which component becomes the bottleneck first? What's your mitigation strategy?
**[T3] Q3.** The current system uses synchronous inter-service communication. Design a migration path to event-driven architecture that:
- Has zero downtime
- Can be rolled back at any stage
- Preserves data consistency guarantees用于测试系统层面的理解能力。
markdown
**[T1] 问题1.** 该代码库采用了什么架构模式?
**[T2] 问题2.** 如果读流量增长100倍,哪个组件会最先成为瓶颈?你的缓解策略是什么?
**[T3] 问题3.** 当前系统采用同步服务间通信方式。请设计一套迁移到事件驱动架构的方案,满足以下要求:
- 零停机
- 任意阶段都可回滚
- 保留数据一致性保障Type C: Process & Methodology
C类:流程与方法论
For testing workflow and best-practice knowledge.
markdown
**[T1] Q1.** What is the purpose of a code review?
**[T2] Q2.** Given this PR with 3 changed files, identify the 2 most important review comments you would make.
**[T3] Q3.** Design a CI/CD pipeline for this project that balances speed with safety. Justify each stage's inclusion and the order.用于测试工作流和最佳实践知识掌握。
markdown
**[T1] 问题1.** 代码评审的目的是什么?
**[T2] 问题2.** 给定这个修改了3个文件的PR,指出你会提出的2个最重要的评审意见。
**[T3] 问题3.** 为该项目设计一套兼顾速度和安全性的CI/CD流水线。说明每个阶段的存在必要性和排序理由。Type D: Concept Mastery
D类:概念掌握
For testing domain knowledge.
markdown
**[T1] Q1.** Define "eventual consistency" in your own words.
**[T2] Q2.** Your system uses eventual consistency for user profiles. A user updates their email and immediately tries to log in with the new email. What happens? How do you handle it?
**[T3] Q3.** Compare eventual consistency vs. strong consistency for a financial transaction system. Under what specific conditions would you choose eventual consistency despite the risks?用于测试领域知识掌握程度。
markdown
**[T1] 问题1.** 用你自己的话定义"最终一致性"。
**[T2] 问题2.** 你的系统对用户 profile 采用最终一致性策略。一个用户更新了邮箱后立即尝试用新邮箱登录,会发生什么?你会怎么处理这个问题?
**[T3] 问题3.** 对比金融交易系统中最终一致性和强一致性的优劣。在哪些特定条件下,你会冒着风险选择最终一致性?Process
执行流程
Step 1: Analyze the Subject
步骤1:分析主题
- If code: Read the files, understand the structure
- If concept: Define the scope and depth
- If architecture: Map the components
- 如果主题是代码:阅读相关文件,理解结构
- 如果主题是概念:明确范围和深度
- 如果主题是架构:梳理组件关系
Step 2: Generate Question Set
步骤2:生成问题集
Default: 3 questions per tier (9 total).
Customizable: user can specify count per tier.
Distribution:
Tier 1 (Conceptual): 3 questions — foundation verification
Tier 2 (Applied): 3 questions — practical understanding
Tier 3 (Expert): 3 questions — mastery and edge cases默认配置:每个难度层级3道题(共9道)。
支持自定义:用户可以指定每个层级的题目数量。
题目分布:
一级(概念类):3道题 —— 基础掌握验证
二级(应用类):3道题 —— 实践理解验证
三级(专家类):3道题 —— 精通程度与边界场景处理能力验证Step 3: Create Answer Key
步骤3:创建答案密钥
For each question:
- Correct answer with explanation
- Why wrong answers are wrong (for multiple choice)
- Grading rubric (for open-ended questions)
为每道题准备:
- 正确答案及解释
- 错误答案的错因(针对选择题)
- 评分标准(针对开放式问题)
Step 4: Deliver
步骤4:交付内容
Present questions without answers. Hold answer key until user submits responses.
先展示不带答案的问题,待用户提交作答后再提供答案密钥。
Output Format
输出格式
markdown
undefinedmarkdown
undefinedTest: [Topic]
测试:[主题]
Instructions
答题说明
- [N] questions across 3 difficulty tiers
- Answer all questions, then submit for grading
- Open-ended questions: aim for 2-3 sentences
- 共[N]道题,分为3个难度层级
- 作答所有问题后提交即可获得评分
- 开放式问题请尽量用2-3句话回答
Tier 1: Conceptual
一级:概念类
Q1. [question]
a) [option] b) [option] c) [option] d) [option]
Q2. [question]
Q3. [question]
问题1. [题目内容]
a) [选项] b) [选项] c) [选项] d) [选项]
问题2. [题目内容]
问题3. [题目内容]
Tier 2: Applied
二级:应用类
Q4. [scenario + question]
Q5. [debugging scenario]
Q6. [what-happens-when scenario]
问题4. [场景 + 题目内容]
问题5. [调试场景]
问题6. [当XX时会发生什么场景]
Tier 3: Expert
三级:专家类
Q7. [edge case challenge]
Q8. [design challenge]
Q9. [trade-off analysis]
Submit your answers and I'll grade them with detailed feedback.
undefined问题7. [边界场景挑战]
问题8. [设计挑战]
问题9. [权衡分析]
提交你的答案,我会为你评分并提供详细反馈。
undefinedGrading (Post-Submission)
评分规则(提交后)
When user submits answers:
markdown
undefined当用户提交答案后:
markdown
undefinedResults: [Topic]
结果:[主题]
Score: [X]/[Total] ([percentage]%)
得分:[X]/[总分] ([百分比]%)
Scoring weights by tier:
- Tier 1 (Conceptual): 5 pts each (×3 = 15)
- Tier 2 (Applied): 10 pts each (×3 = 30)
- Tier 3 (Expert): 15 pts each (×3 = 45)
- Total: 90 points
各层级评分权重:
- 一级(概念类):每题5分(×3 = 15分)
- 二级(应用类):每题10分(×3 = 30分)
- 三级(专家类):每题15分(×3 = 45分)
- 总分:90分
Answer Review
答案回顾
| Q# | Tier | Result | Score |
|---|---|---|---|
| 1 | T1 | O/X | /5 |
| 2 | T1 | O/X | /5 |
| 3 | T1 | O/X | /5 |
| 4 | T2 | O/X | /10 |
| 5 | T2 | O/X | /10 |
| 6 | T2 | O/X | /10 |
| 7 | T3 | O/X | /15 |
| 8 | T3 | O/X | /15 |
| 9 | T3 | O/X | /15 |
| 题号 | 层级 | 结果 | 得分 |
|---|---|---|---|
| 1 | T1 | 对/错 | /5 |
| 2 | T1 | 对/错 | /5 |
| 3 | T1 | 对/错 | /5 |
| 4 | T2 | 对/错 | /10 |
| 5 | T2 | 对/错 | /10 |
| 6 | T2 | 对/错 | /10 |
| 7 | T3 | 对/错 | /15 |
| 8 | T3 | 对/错 | /15 |
| 9 | T3 | 对/错 | /15 |
Detailed Feedback
详细反馈
Q[N] — [X] Incorrect
第[N]题 — [X]答错
Your answer: [what they said]
Correct answer: [what it should be]
Why: [explanation of the correct answer]
Key insight: [what understanding gap this reveals]
你的答案: [用户作答内容]
正确答案: [标准答案]
原因: [正确答案的解释]
核心洞见: [暴露的知识掌握缺口]
Diagnostic Summary
诊断总结
| Dimension | Assessment |
|---|---|
| Concept Connectivity | [How well fundamentals are linked] |
| Procedural Stability | [How reliably they can apply knowledge] |
| Meta-Cognition | [How well they know what they don't know] |
| 维度 | 评估 |
|---|---|
| 概念关联性 | [基础知识的串联掌握程度] |
| 流程应用稳定性 | [知识应用的可靠程度] |
| 元认知能力 | [对自身知识盲区的认知程度] |
Recommended Next Steps
推荐后续步骤
- [Specific topics to review based on wrong answers]
undefined- [根据答错的题目推荐具体的复习主题]
undefinedWhen to Use
适用场景
- After learning something new — verify understanding
- Before a code review — test your own knowledge of the codebase
- Interview preparation — generate practice questions
- Team knowledge assessment — create standardized tests
- After refactoring — verify nothing was lost in translation
- Teaching/documentation — create practice exercises
- 学习新内容后 —— 验证理解程度
- 代码评审前 —— 测试自己对代码库的掌握程度
- 面试准备 —— 生成练习题
- 团队知识评估 —— 创建标准化测试
- 重构后 —— 验证没有遗漏原有逻辑
- 教学/文档编写 —— 创建练习题
When NOT to Use
不适用场景
- For subjective opinion questions (no right answer)
- When the user just wants information (use )
deep-dive-analyzer - For trivial topics that don't warrant testing
- 主观观点类问题(无正确答案)
- 用户仅想获取信息(请使用)
deep-dive-analyzer - 过于琐碎不值得测试的主题
Integration Notes
集成说明
- After deep-dive-analyzer: Analyze → Generate tests to verify understanding
- With skill-composer: Part of the "Deep Learning Pipeline" (analyze → test → fill gaps → iterate)
- With adversarial-review: Tests verify understanding; adversarial review challenges decisions
- 在deep-dive-analyzer之后使用:分析 → 生成测试验证理解
- 与skill-composer搭配使用:属于"深度学习流水线"的一部分(分析 → 测试 → 填补缺口 → 迭代)
- 与adversarial-review搭配使用:测试验证理解,对抗式评审挑战决策