hierarchical-matching-systems
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHierarchical Matching Systems
分层匹配系统
This skill provides rigid diagnostic and architectural procedures for hierarchical matching systems. Follow checklists exactly—matching bugs often hide in skipped steps.
本技能为分层匹配系统提供严谨的诊断与架构流程。请严格遵循检查清单——匹配类bug常隐藏在跳过的步骤中。
Quick Reference
快速参考
- Algorithm selection: See references/decision-guide.md
- Algorithm details: See references/algorithms.md
- 算法选择:参见 references/decision-guide.md
- 算法细节:参见 references/algorithms.md
1. Problem Classification Checklist
1. 问题分类检查清单
Before any work, classify the problem. Check ALL that apply:
□ TWO-SIDED: Both sides have preferences (students↔schools, workers↔jobs)
□ ONE-SIDED: Only one side has preferences (tasks→workers, items→bins)
□ HIERARCHICAL: Entities exist at multiple levels (org→dept→team→person)
□ WEIGHTED: Matches have costs/scores to optimize
□ CONSTRAINED: Hard limits exist (capacity, exclusions, required pairings)
□ STABLE: Solution must resist defection (no blocking pairs)
□ OPTIMAL: Solution must minimize/maximize objective function
□ FUZZY: Entities may partially match (entity resolution, deduplication)Classification determines algorithm family. Proceed to Section 2 for architecture or Section 3 for debugging.
开展任何工作前,先对问题进行分类。勾选所有适用项:
□ 双边匹配:双方均有偏好(学生↔学校、工人↔工作)
□ 单边匹配:仅一方有偏好(任务→工人、物品→容器)
□ 分层匹配:实体存在于多个层级(组织→部门→团队→个人)
□ 加权匹配:匹配带有成本/分数以优化
□ 约束匹配:存在硬性限制(容量、排除项、强制配对)
□ 稳定匹配:解决方案需避免“背叛”(无阻塞对)
□ 最优匹配:解决方案需最小化/最大化目标函数
□ 模糊匹配:实体可能部分匹配(实体解析、去重)分类结果将决定算法选型。如需架构设计请进入第2节,如需调试请进入第3节。
2. Architecture Procedure
2. 架构设计流程
Follow these phases in order when designing a new matching system.
设计新匹配系统时,请按以下阶段依次进行。
Phase 2.1: Requirements Translation
阶段2.1:需求转化
Convert each business requirement into formal constraints:
| Business Requirement | Constraint Type | Formal Expression |
|---|---|---|
| "Each student gets one school" | Capacity | ` |
| "Schools have seat limits" | Capacity | ` |
| "Siblings must be together" | Coupling | |
| "Student X cannot attend Y" | Exclusion | |
| "Priority for residents" | Preference ordering | |
Checklist:
□ List ALL business requirements
□ Classify each as: capacity | coupling | exclusion | ordering | soft preference
□ Identify conflicts between requirements (document tradeoffs)
□ Distinguish HARD constraints (must satisfy) from SOFT (optimize toward)
□ Validate translations with stakeholder examples将每项业务需求转化为形式化约束:
| 业务需求 | 约束类型 | 形式化表达 |
|---|---|---|
| "每位学生分配一所学校" | 容量约束 | ` |
| "学校有招生名额限制" | 容量约束 | ` |
| "兄弟姐妹必须同校" | 耦合约束 | |
| "学生X不能就读Y校" | 排除约束 | |
| "本地居民优先" | 偏好排序 | |
检查清单:
□ 列出所有业务需求
□ 将每项需求分类为:容量约束 | 耦合约束 | 排除约束 | 排序约束 | 软偏好
□ 识别需求间的冲突(记录权衡方案)
□ 区分硬性约束(必须满足)与软性约束(优化目标)
□ 通过干系人示例验证转化结果Phase 2.2: Algorithm Selection
阶段2.2:算法选择
Use references/decision-guide.md to select algorithm. Verify:
□ Algorithm handles all HARD constraints
□ Algorithm can optimize SOFT constraints (or document gaps)
□ Complexity acceptable for data size (see references/algorithms.md)
□ Stability requirements met (if two-sided)
□ Optimality requirements met (if weighted)使用 references/decision-guide.md 选择算法,并验证:
□ 算法支持所有硬性约束
□ 算法可优化软性约束(或记录缺口)
□ 复杂度适配数据规模(参见references/algorithms.md)
□ 满足稳定性要求(若为双边匹配)
□ 满足最优性要求(若为加权匹配)Phase 2.3: Data Model Design
阶段2.3:数据模型设计
Define entities, relationships, and preference representation:
□ Entity schema for each side (attributes, identifiers)
□ Preference representation (ranked list | score matrix | pairwise comparisons)
□ Constraint encoding (how exclusions/couplings are stored)
□ Hierarchy representation (if multi-level: tree | DAG | adjacency list)
□ Tie-breaking rules (deterministic ordering for equal preferences)定义实体、关系与偏好表示方式:
□ 双方实体的 schema(属性、标识符)
□ 偏好表示形式(排名列表 | 分数矩阵 | 两两比较)
□ 约束编码方式(排除项/耦合关系的存储格式)
□ 层级表示方式(若为多层级:树结构 | 有向无环图 | 邻接表)
□ 平局处理规则(相同偏好下的确定性排序)Phase 2.4: Interface Contracts
阶段2.4:接口契约
Specify inputs, outputs, and invariants:
Input Contract:
□ Preference format and validation rules
□ Constraint format and validation rules
□ Required vs optional fields
□ How missing preferences are handled (reject | default rank | exclude)Output Contract:
□ Match format (pairs | assignment map | ranked list)
□ Unmatched entity handling (explicit list | null matches | error)
□ Match metadata (scores, stability proof, constraint satisfaction report)Invariants:
□ Determinism: same input → same output (document randomness if any)
□ Completeness: all entities matched OR explicitly unmatched
□ Validity: all matches satisfy hard constraints指定输入、输出与不变量:
输入契约:
□ 偏好格式与验证规则
□ 约束格式与验证规则
□ 必填与可选字段
□ 缺失偏好的处理方式(拒绝 | 默认排名 | 排除)输出契约:
□ 匹配结果格式(配对 | 分配映射 | 排名列表)
□ 未匹配实体的处理方式(显式列表 | 空匹配 | 报错)
□ 匹配元数据(分数、稳定性证明、约束满足报告)不变量:
□ 确定性:相同输入→相同输出(若存在随机性需记录)
□ 完整性:所有实体要么已匹配,要么显式标记为未匹配
□ 有效性:所有匹配均满足硬性约束Phase 2.5: Testing Strategy
阶段2.5:测试策略
Define validation before implementation:
□ Unit tests for preference parsing and constraint validation
□ Property tests: stability, optimality, constraint satisfaction
□ Edge cases: empty inputs, single entity, all tied preferences
□ Regression tests from known-good examples
□ Performance benchmarks at target scale在实现前定义验证方案:
□ 偏好解析与约束验证的单元测试
□ 属性测试:稳定性、最优性、约束满足性
□ 边缘案例:空输入、单个实体、所有偏好相同
□ 基于已知正确示例的回归测试
□ 目标规模下的性能基准测试3. Debugging Procedure
3. 调试流程
Follow this diagnostic sequence for any matching issue. Do not skip steps.
遇到任何匹配问题,请遵循以下诊断流程,请勿跳过步骤。
Phase 3.1: Symptom Classification
阶段3.1:症状分类
Identify the symptom category:
| Symptom | Category | Go To |
|---|---|---|
| Same inputs, different outputs | INSTABILITY | 3.2 |
| Matches violate business rules | CONSTRAINT VIOLATION | 3.3 |
| Matches technically valid but "wrong" | PREFERENCE MISALIGNMENT | 3.4 |
| Errors with nested/hierarchical data | HIERARCHY BUG | 3.5 |
| Poor performance at scale | PERFORMANCE | 3.6 |
识别症状类别:
| 症状 | 类别 | 跳转至 |
|---|---|---|
| 相同输入产生不同输出 | 不稳定性 | 3.2 |
| 匹配结果违反业务规则 | 约束违规 | 3.3 |
| 匹配结果技术上有效但“不符合预期” | 偏好不一致 | 3.4 |
| 嵌套/分层数据出现错误 | 层级bug | 3.5 |
| 大规模场景下性能不佳 | 性能问题 | 3.6 |
Phase 3.2: Instability Diagnosis
阶段3.2:不稳定性诊断
Root causes of non-deterministic matches:
□ RANDOMNESS: Check for unseeded RNG in tie-breaking
→ Fix: Use deterministic tie-breaker (lexicographic ID, timestamp)
□ FLOATING POINT: Check score comparisons for floating point issues
→ Fix: Use epsilon comparison or integer scores
□ HASH ORDERING: Check if iteration order depends on hash maps
→ Fix: Sort keys before iteration
□ PARALLEL RACE: Check for concurrent modifications
→ Fix: Synchronize or use sequential processing
□ INPUT ORDERING: Check if algorithm is order-sensitive
→ Fix: Canonicalize input order (sort by ID)Verification:
□ Run matching 10x with identical inputs
□ Diff all outputs
□ If any differ, add logging to identify divergence point非确定性匹配的根本原因:
□ 随机性:检查平局处理中是否使用未设种子的RNG
→ 修复方案:使用确定性平局规则(字典序ID、时间戳)
□ 浮点数问题:检查分数比较是否存在浮点数精度问题
→ 修复方案:使用epsilon比较或整数分数
□ 哈希排序:检查迭代顺序是否依赖哈希映射
→ 修复方案:迭代前对键进行排序
□ 并行竞争:检查是否存在并发修改
→ 修复方案:同步处理或使用顺序执行
□ 输入顺序:检查算法是否对输入顺序敏感
→ 修复方案:规范输入顺序(按ID排序)验证:
□ 使用相同输入运行匹配10次
□ 对比所有输出结果
□ 若存在差异,添加日志以定位分歧点Phase 3.3: Constraint Violation Diagnosis
阶段3.3:约束违规诊断
Diagnostic sequence:
1. □ IDENTIFY: Which specific constraint is violated?
→ List the violated constraint and the violating match
2. □ TRACE: Where should constraint be enforced?
→ Map constraint to code location (filter | validation | algorithm step)
3. □ VERIFY ENCODING: Is constraint correctly represented?
→ Print constraint data structure, verify against requirement
4. □ VERIFY ENFORCEMENT: Is constraint checked at right time?
→ Add logging before/after enforcement point
5. □ CHECK ORDERING: Is constraint checked before conflicting decisions?
→ Trace decision sequence, verify constraint checked first
6. □ CHECK COMPLETENESS: Are all instances covered?
→ Enumerate all entities that should be constrainedCommon failure patterns:
| Pattern | Symptom | Fix |
|---|---|---|
| Late enforcement | Valid intermediate state, invalid final | Move check earlier |
| Partial coverage | Some entities constrained, others not | Enumerate all cases |
| Soft vs hard confusion | Constraint violated for "better" match | Reclassify as hard |
| Stale data | Constraint on outdated values | Refresh before check |
诊断流程:
1. □ 定位:明确违反了哪项具体约束?
→ 列出违规约束及对应的匹配结果
2. □ 追踪:约束应在何处执行?
→ 将约束映射到代码位置(过滤器 | 验证器 | 算法步骤)
3. □ 验证编码:约束是否正确表示?
→ 打印约束数据结构,与需求进行对比
4. □ 验证执行:约束是否在正确时机被检查?
→ 在执行点前后添加日志
5. □ 检查顺序:约束是否在冲突决策前被检查?
→ 追踪决策流程,验证约束先于冲突决策执行
6. □ 检查完整性:是否覆盖了所有实例?
→ 枚举所有应受约束的实体常见失败模式:
| 模式 | 症状 | 修复方案 |
|---|---|---|
| 延迟执行 | 中间状态有效,最终状态无效 | 将检查提前 |
| 部分覆盖 | 部分实体受约束,其他不受 | 枚举所有场景 |
| 软硬混淆 | 为了“更优”匹配违反约束 | 重新归类为硬性约束 |
| 数据过期 | 约束基于过时值 | 检查前刷新数据 |
Phase 3.4: Preference Misalignment Diagnosis
阶段3.4:偏好不一致诊断
When matches are valid but don't reflect intended priorities:
1. □ EXTRACT: Get the actual preference data used
→ Log/print the exact preference structure at match time
2. □ COMPARE: Check against expected preferences
→ Side-by-side diff with business-stated priorities
3. □ TRACE TRANSFORMATION: Follow preference from input to algorithm
→ Log at each transformation step (parsing, normalization, scoring)
4. □ CHECK SCORING: Verify score calculation
→ Manual calculation for 2-3 example cases
5. □ CHECK AGGREGATION: If multi-criteria, verify combination
→ Test each criterion independently, then combined
6. □ CHECK NORMALIZATION: Verify scale/range handling
→ Check for min/max, z-score, or rank normalization bugsScoring function checklist:
□ Direction correct (higher = better or lower = better, consistently)
□ Scale appropriate (no single factor dominating)
□ Missing values handled (null → 0? → excluded? → default?)
□ Ties handled explicitly (not left to floating point chance)
□ Edge cases: extreme values, all same values, single candidate当匹配结果有效但未反映预期优先级时:
1. □ 提取:获取匹配时实际使用的偏好数据
→ 记录/打印匹配时的精确偏好结构
2. □ 对比:与预期偏好进行检查
→ 将实际偏好与业务声明的优先级进行并排对比
3. □ 追踪转换:跟随偏好从输入到算法的全流程
→ 在每个转换步骤(解析、归一化、打分)添加日志
4. □ 检查打分:验证分数计算逻辑
→ 手动计算2-3个示例场景
5. □ 检查聚合:若为多准则,验证组合逻辑
→ 独立测试每个准则,再测试组合效果
6. □ 检查归一化:验证尺度/范围处理逻辑
→ 检查min/max、z-score或排名归一化是否存在bug打分函数检查清单:
□ 方向正确(始终保持“越高越好”或“越低越好”)
□ 尺度合适(无单一因素主导结果)
□ 缺失值处理正确(null→0?→排除?→默认值?)
□ 平局处理明确(不依赖浮点数随机性)
□ 边缘案例:极端值、所有值相同、单个候选者Phase 3.5: Hierarchy Traversal Diagnosis
阶段3.5:层级遍历诊断
For multi-level matching issues:
1. □ VISUALIZE: Draw the hierarchy with the problematic match
→ Tree diagram showing all levels and the match path
2. □ CHECK INHERITANCE: Do child constraints inherit from parent?
→ Verify constraint propagation rules
3. □ CHECK AGGREGATION: How do child preferences roll up?
→ Verify aggregation function (sum | max | weighted | majority)
4. □ CHECK LEVEL INTERACTION: Can matches cross levels?
→ Document allowed/forbidden cross-level matches
5. □ CHECK TRAVERSAL ORDER: Top-down or bottom-up?
→ Verify algorithm processes levels in intended order
6. □ CHECK PARTIAL MATCHES: Can a parent match without all children?
→ Document completeness requirements per levelCommon hierarchy bugs:
| Bug | Symptom | Fix |
|---|---|---|
| Missing propagation | Child ignores parent constraint | Add inheritance logic |
| Double counting | Same entity weighted multiple times | Deduplicate in aggregation |
| Level skipping | Match at wrong level | Add level validation |
| Orphan handling | Unattached children cause errors | Define orphan policy |
针对多层级匹配问题:
1. □ 可视化:绘制包含问题匹配的层级结构
→ 树状图展示所有层级及匹配路径
2. □ 检查继承:子约束是否从父级继承?
→ 验证约束传播规则
3. □ 检查聚合:子偏好如何向上汇总?
→ 验证聚合函数(求和 | 最大值 | 加权 | 多数投票)
4. □ 检查层级交互:匹配是否允许跨层级?
→ 记录允许/禁止跨层级匹配的规则
5. □ 检查遍历顺序:自上而下还是自下而上?
→ 验证算法是否按预期顺序处理层级
6. □ 检查部分匹配:父级是否可在未匹配所有子级的情况下完成匹配?
→ 记录各层级的完整性要求常见层级bug:
| Bug | 症状 | 修复方案 |
|---|---|---|
| 传播缺失 | 子级忽略父级约束 | 添加继承逻辑 |
| 重复计数 | 同一实体被多次加权 | 聚合时去重 |
| 层级跳过 | 匹配发生在错误层级 | 添加层级验证 |
| 孤立节点处理 | 未关联的子级导致错误 | 定义孤立节点处理策略 |
Phase 3.6: Performance Diagnosis
阶段3.6:性能诊断
For scale and speed issues:
1. □ PROFILE: Identify the slow component
→ Time each phase: input parsing, preference building, matching, output
2. □ COMPLEXITY CHECK: Verify actual vs expected complexity
→ Log iteration counts, compare to theoretical O(n)
3. □ MEMORY CHECK: Profile memory usage
→ Watch for preference matrix explosion (n² space)
4. □ ALGORITHM FIT: Verify algorithm appropriate for scale
→ See references/algorithms.md for complexity comparison
5. □ CACHING: Check for redundant computation
→ Log cache hits/misses for preference lookups
6. □ BATCH VS STREAMING: Check processing model
→ Full recomputation vs incremental updates针对规模与速度问题:
1. □ 性能分析:定位缓慢组件
→ 统计各阶段耗时:输入解析、偏好构建、匹配计算、输出
2. □ 复杂度检查:验证实际复杂度与预期是否一致
→ 记录迭代次数,与理论O(n)对比
3. □ 内存检查:分析内存使用情况
→ 警惕偏好矩阵爆炸(n²空间复杂度)
4. □ 算法适配:验证算法是否适配当前规模
→ 参见references/algorithms.md的复杂度对比
5. □ 缓存检查:是否存在冗余计算
→ 记录偏好查询的缓存命中/未命中情况
6. □ 批量 vs 流式:检查处理模型
→ 全量重新计算 vs 增量更新4. Testing & Validation Procedure
4. 测试与验证流程
4.1: Correctness Properties
4.1 正确性属性
Test these properties for every matching system:
□ DETERMINISM: run(input) = run(input) (10 trials minimum)
□ COMPLETENESS: all entities either matched or explicitly unmatched
□ VALIDITY: all matches satisfy all hard constraints
□ STABILITY (if applicable): no blocking pairs exist
□ OPTIMALITY (if applicable): objective function at expected value所有匹配系统均需测试以下属性:
□ 确定性:run(input) = run(input)(至少运行10次)
□ 完整性:所有实体要么已匹配,要么显式标记为未匹配
□ 有效性:所有匹配均满足所有硬性约束
□ 稳定性(若适用):不存在阻塞对
□ 最优性(若适用):目标函数达到预期值4.2: Stability Verification (Two-Sided Matching)
4.2 稳定性验证(双边匹配)
For stable matching, verify no blocking pairs:
python
undefined对于稳定匹配,需验证不存在阻塞对:
python
undefinedPseudocode - verify no blocking pair exists
伪代码 - 验证不存在阻塞对
for each unmatched_pair (a, b):
if a prefers b over current_match(a):
if b prefers a over current_match(b):
FAIL: blocking pair found (a, b)
undefined□ Enumerate all non-matched pairs
□ Check mutual preference for each
□ Report any blocking pairs found
□ For large instances, sample-check (document coverage)
undefinedfor each unmatched_pair (a, b):
if a prefers b over current_match(a):
if b prefers a over current_match(b):
FAIL: 发现阻塞对 (a, b)
undefined□ 枚举所有未配对的实体组合
□ 检查每对组合的相互偏好
□ 报告所有发现的阻塞对
□ 对于大规模实例,采用抽样检查(记录覆盖范围)
undefined4.3: Constraint Satisfaction Verification
4.3 约束满足验证
□ List all hard constraints
□ For each match, verify against each constraint
□ Generate constraint satisfaction report
□ Flag any violations with specific match and constraint□ 列出所有硬性约束
□ 针对每个匹配结果,验证其符合所有约束
□ 生成约束满足报告
□ 标记任何违规的匹配结果及对应的约束4.4: Edge Case Test Suite
4.4 边缘案例测试套件
Mandatory test cases:
□ Empty input (no entities on one or both sides)
□ Single entity (one-to-one degenerate case)
□ All identical preferences (maximum tie scenario)
□ Mutually exclusive preferences (everyone wants same thing)
□ Impossible constraints (unsatisfiable, should error clearly)
□ Maximum capacity (all slots exactly filled)
□ Minimum capacity (barely enough slots)
□ Self-referential (can entity match itself? test boundary)
□ Circular preferences (A→B→C→A)必选测试用例:
□ 空输入(一方或双方无实体)
□ 单个实体(一对一退化场景)
□ 所有偏好相同(最大平局场景)
□ 互斥偏好(所有人都想要同一个对象)
□ 不可能约束(无法满足,应清晰报错)
□ 最大容量(所有名额刚好填满)
□ 最小容量(仅够满足极少需求)
□ 自引用(实体能否与自身匹配?测试边界)
□ 循环偏好(A→B→C→A)4.5: Regression Test Maintenance
4.5 回归测试维护
□ Capture real production cases that revealed bugs
□ Minimize to smallest reproducing example
□ Document expected behavior explicitly
□ Run on every change to matching logic□ 捕获暴露bug的真实生产场景
□ 简化为最小复现示例
□ 明确记录预期行为
□ 每次修改匹配逻辑时运行回归测试5. Review Checklist
5. 评审检查清单
When reviewing matching system code or design:
评审匹配系统代码或设计时:
5.1: Design Review
5.1 设计评审
□ Problem correctly classified (Section 1)
□ Algorithm appropriate for problem class (references/decision-guide.md)
□ All business requirements mapped to constraints (Section 2.1)
□ Hard vs soft constraints clearly distinguished
□ Tie-breaking is deterministic and documented
□ Hierarchy semantics defined (if applicable)□ 问题分类正确(第1节)
□ 算法适配问题类别(参见references/decision-guide.md)
□ 所有业务需求均映射为约束(第2.1节)
□ 硬性与软性约束区分明确
□ 平局处理规则确定且已记录
□ 层级语义已定义(若适用)5.2: Implementation Review
5.2 实现评审
□ Preference representation matches algorithm requirements
□ Constraints enforced at correct point in algorithm
□ No hidden randomness (unseeded RNG, hash iteration)
□ Floating point comparison handled correctly
□ Edge cases handled (empty, single, ties)
□ Error messages identify specific constraint violations□ 偏好表示方式符合算法要求
□ 约束在算法的正确阶段执行
□ 无隐藏随机性(未设种子的RNG、哈希迭代)
□ 浮点数比较处理正确
□ 边缘案例已处理(空输入、单个实体、平局)
□ 错误消息明确指出具体约束违规5.3: Testing Review
5.3 测试评审
□ All properties from 4.1 tested
□ Edge cases from 4.4 covered
□ Performance benchmarked at realistic scale
□ Regression tests exist for past bugs□ 已测试4.1节中的所有属性
□ 已覆盖4.4节中的边缘案例
□ 已在真实规模下进行性能基准测试
□ 已为过往bug建立回归测试Appendix: Common Anti-Patterns
附录:常见反模式
| Anti-Pattern | Problem | Solution |
|---|---|---|
| Greedy first-come | Order-dependent, non-optimal | Use proper algorithm |
| Score = sum(all factors) | One factor dominates | Normalize scales |
| Retry until valid | Non-deterministic, slow | Fix constraint order |
| Global preference cache | Stale across updates | Invalidate on change |
| String matching for entities | Case/whitespace bugs | Use canonical IDs |
| Float equality for ties | Non-deterministic | Use epsilon or integer |
| Recursive hierarchy walk | Stack overflow risk | Use iterative with explicit stack |
| N² preference matrix | Memory explosion | Use sparse representation |
| 反模式 | 问题 | 解决方案 |
|---|---|---|
| 贪心先到先得 | 依赖输入顺序,非最优 | 使用标准算法 |
| 分数=所有因素之和 | 单一因素主导结果 | 归一化尺度 |
| 重试直到有效 | 非确定性、缓慢 | 修复约束执行顺序 |
| 全局偏好缓存 | 更新后数据过期 | 变更时失效缓存 |
| 实体字符串匹配 | 大小写/空格bug | 使用标准ID |
| 浮点数判断平局 | 非确定性 | 使用epsilon或整数 |
| 递归层级遍历 | 栈溢出风险 | 使用迭代式显式栈 |
| N²偏好矩阵 | 内存爆炸 | 使用稀疏表示 |
| ", |