hierarchical-matching-systems

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Hierarchical Matching Systems

分层匹配系统

This skill provides rigid diagnostic and architectural procedures for hierarchical matching systems. Follow checklists exactly—matching bugs often hide in skipped steps.
本技能为分层匹配系统提供严谨的诊断与架构流程。请严格遵循检查清单——匹配类bug常隐藏在跳过的步骤中。

Quick Reference

快速参考

  • Algorithm selection: See references/decision-guide.md
  • Algorithm details: See references/algorithms.md

  • 算法选择:参见 references/decision-guide.md
  • 算法细节:参见 references/algorithms.md

1. Problem Classification Checklist

1. 问题分类检查清单

Before any work, classify the problem. Check ALL that apply:
□ TWO-SIDED: Both sides have preferences (students↔schools, workers↔jobs)
□ ONE-SIDED: Only one side has preferences (tasks→workers, items→bins)
□ HIERARCHICAL: Entities exist at multiple levels (org→dept→team→person)
□ WEIGHTED: Matches have costs/scores to optimize
□ CONSTRAINED: Hard limits exist (capacity, exclusions, required pairings)
□ STABLE: Solution must resist defection (no blocking pairs)
□ OPTIMAL: Solution must minimize/maximize objective function
□ FUZZY: Entities may partially match (entity resolution, deduplication)
Classification determines algorithm family. Proceed to Section 2 for architecture or Section 3 for debugging.

开展任何工作前,先对问题进行分类。勾选所有适用项:
□ 双边匹配:双方均有偏好(学生↔学校、工人↔工作)
□ 单边匹配:仅一方有偏好(任务→工人、物品→容器)
□ 分层匹配:实体存在于多个层级(组织→部门→团队→个人)
□ 加权匹配:匹配带有成本/分数以优化
□ 约束匹配:存在硬性限制(容量、排除项、强制配对)
□ 稳定匹配:解决方案需避免“背叛”(无阻塞对)
□ 最优匹配:解决方案需最小化/最大化目标函数
□ 模糊匹配:实体可能部分匹配(实体解析、去重)
分类结果将决定算法选型。如需架构设计请进入第2节,如需调试请进入第3节。

2. Architecture Procedure

2. 架构设计流程

Follow these phases in order when designing a new matching system.
设计新匹配系统时,请按以下阶段依次进行。

Phase 2.1: Requirements Translation

阶段2.1:需求转化

Convert each business requirement into formal constraints:
Business RequirementConstraint TypeFormal Expression
"Each student gets one school"Capacity`
"Schools have seat limits"Capacity`
"Siblings must be together"Coupling
school(s1) = school(s2) if siblings(s1,s2)
"Student X cannot attend Y"Exclusion
(X, Y) ∉ matches
"Priority for residents"Preference ordering
rank(resident) < rank(non-resident)
Checklist:
□ List ALL business requirements
□ Classify each as: capacity | coupling | exclusion | ordering | soft preference
□ Identify conflicts between requirements (document tradeoffs)
□ Distinguish HARD constraints (must satisfy) from SOFT (optimize toward)
□ Validate translations with stakeholder examples
将每项业务需求转化为形式化约束:
业务需求约束类型形式化表达
"每位学生分配一所学校"容量约束`
"学校有招生名额限制"容量约束`
"兄弟姐妹必须同校"耦合约束
school(s1) = school(s2) if siblings(s1,s2)
"学生X不能就读Y校"排除约束
(X, Y) ∉ matches
"本地居民优先"偏好排序
rank(resident) < rank(non-resident)
检查清单:
□ 列出所有业务需求
□ 将每项需求分类为:容量约束 | 耦合约束 | 排除约束 | 排序约束 | 软偏好
□ 识别需求间的冲突(记录权衡方案)
□ 区分硬性约束(必须满足)与软性约束(优化目标)
□ 通过干系人示例验证转化结果

Phase 2.2: Algorithm Selection

阶段2.2:算法选择

Use references/decision-guide.md to select algorithm. Verify:
□ Algorithm handles all HARD constraints
□ Algorithm can optimize SOFT constraints (or document gaps)
□ Complexity acceptable for data size (see references/algorithms.md)
□ Stability requirements met (if two-sided)
□ Optimality requirements met (if weighted)
使用 references/decision-guide.md 选择算法,并验证:
□ 算法支持所有硬性约束
□ 算法可优化软性约束(或记录缺口)
□ 复杂度适配数据规模(参见references/algorithms.md)
□ 满足稳定性要求(若为双边匹配)
□ 满足最优性要求(若为加权匹配)

Phase 2.3: Data Model Design

阶段2.3:数据模型设计

Define entities, relationships, and preference representation:
□ Entity schema for each side (attributes, identifiers)
□ Preference representation (ranked list | score matrix | pairwise comparisons)
□ Constraint encoding (how exclusions/couplings are stored)
□ Hierarchy representation (if multi-level: tree | DAG | adjacency list)
□ Tie-breaking rules (deterministic ordering for equal preferences)
定义实体、关系与偏好表示方式:
□ 双方实体的 schema(属性、标识符)
□ 偏好表示形式(排名列表 | 分数矩阵 | 两两比较)
□ 约束编码方式(排除项/耦合关系的存储格式)
□ 层级表示方式(若为多层级:树结构 | 有向无环图 | 邻接表)
□ 平局处理规则(相同偏好下的确定性排序)

Phase 2.4: Interface Contracts

阶段2.4:接口契约

Specify inputs, outputs, and invariants:
Input Contract:
□ Preference format and validation rules
□ Constraint format and validation rules
□ Required vs optional fields
□ How missing preferences are handled (reject | default rank | exclude)
Output Contract:
□ Match format (pairs | assignment map | ranked list)
□ Unmatched entity handling (explicit list | null matches | error)
□ Match metadata (scores, stability proof, constraint satisfaction report)
Invariants:
□ Determinism: same input → same output (document randomness if any)
□ Completeness: all entities matched OR explicitly unmatched
□ Validity: all matches satisfy hard constraints
指定输入、输出与不变量:
输入契约:
□ 偏好格式与验证规则
□ 约束格式与验证规则
□ 必填与可选字段
□ 缺失偏好的处理方式(拒绝 | 默认排名 | 排除)
输出契约:
□ 匹配结果格式(配对 | 分配映射 | 排名列表)
□ 未匹配实体的处理方式(显式列表 | 空匹配 | 报错)
□ 匹配元数据(分数、稳定性证明、约束满足报告)
不变量:
□ 确定性:相同输入→相同输出(若存在随机性需记录)
□ 完整性:所有实体要么已匹配,要么显式标记为未匹配
□ 有效性:所有匹配均满足硬性约束

Phase 2.5: Testing Strategy

阶段2.5:测试策略

Define validation before implementation:
□ Unit tests for preference parsing and constraint validation
□ Property tests: stability, optimality, constraint satisfaction
□ Edge cases: empty inputs, single entity, all tied preferences
□ Regression tests from known-good examples
□ Performance benchmarks at target scale

在实现前定义验证方案:
□ 偏好解析与约束验证的单元测试
□ 属性测试:稳定性、最优性、约束满足性
□ 边缘案例:空输入、单个实体、所有偏好相同
□ 基于已知正确示例的回归测试
□ 目标规模下的性能基准测试

3. Debugging Procedure

3. 调试流程

Follow this diagnostic sequence for any matching issue. Do not skip steps.
遇到任何匹配问题,请遵循以下诊断流程,请勿跳过步骤。

Phase 3.1: Symptom Classification

阶段3.1:症状分类

Identify the symptom category:
SymptomCategoryGo To
Same inputs, different outputsINSTABILITY3.2
Matches violate business rulesCONSTRAINT VIOLATION3.3
Matches technically valid but "wrong"PREFERENCE MISALIGNMENT3.4
Errors with nested/hierarchical dataHIERARCHY BUG3.5
Poor performance at scalePERFORMANCE3.6
识别症状类别:
症状类别跳转至
相同输入产生不同输出不稳定性3.2
匹配结果违反业务规则约束违规3.3
匹配结果技术上有效但“不符合预期”偏好不一致3.4
嵌套/分层数据出现错误层级bug3.5
大规模场景下性能不佳性能问题3.6

Phase 3.2: Instability Diagnosis

阶段3.2:不稳定性诊断

Root causes of non-deterministic matches:
□ RANDOMNESS: Check for unseeded RNG in tie-breaking
   → Fix: Use deterministic tie-breaker (lexicographic ID, timestamp)

□ FLOATING POINT: Check score comparisons for floating point issues
   → Fix: Use epsilon comparison or integer scores

□ HASH ORDERING: Check if iteration order depends on hash maps
   → Fix: Sort keys before iteration

□ PARALLEL RACE: Check for concurrent modifications
   → Fix: Synchronize or use sequential processing

□ INPUT ORDERING: Check if algorithm is order-sensitive
   → Fix: Canonicalize input order (sort by ID)
Verification:
□ Run matching 10x with identical inputs
□ Diff all outputs
□ If any differ, add logging to identify divergence point
非确定性匹配的根本原因:
□ 随机性:检查平局处理中是否使用未设种子的RNG
   → 修复方案:使用确定性平局规则(字典序ID、时间戳)

□ 浮点数问题:检查分数比较是否存在浮点数精度问题
   → 修复方案:使用epsilon比较或整数分数

□ 哈希排序:检查迭代顺序是否依赖哈希映射
   → 修复方案:迭代前对键进行排序

□ 并行竞争:检查是否存在并发修改
   → 修复方案:同步处理或使用顺序执行

□ 输入顺序:检查算法是否对输入顺序敏感
   → 修复方案:规范输入顺序(按ID排序)
验证:
□ 使用相同输入运行匹配10次
□ 对比所有输出结果
□ 若存在差异,添加日志以定位分歧点

Phase 3.3: Constraint Violation Diagnosis

阶段3.3:约束违规诊断

Diagnostic sequence:
1. □ IDENTIFY: Which specific constraint is violated?
   → List the violated constraint and the violating match

2. □ TRACE: Where should constraint be enforced?
   → Map constraint to code location (filter | validation | algorithm step)

3. □ VERIFY ENCODING: Is constraint correctly represented?
   → Print constraint data structure, verify against requirement

4. □ VERIFY ENFORCEMENT: Is constraint checked at right time?
   → Add logging before/after enforcement point

5. □ CHECK ORDERING: Is constraint checked before conflicting decisions?
   → Trace decision sequence, verify constraint checked first

6. □ CHECK COMPLETENESS: Are all instances covered?
   → Enumerate all entities that should be constrained
Common failure patterns:
PatternSymptomFix
Late enforcementValid intermediate state, invalid finalMove check earlier
Partial coverageSome entities constrained, others notEnumerate all cases
Soft vs hard confusionConstraint violated for "better" matchReclassify as hard
Stale dataConstraint on outdated valuesRefresh before check
诊断流程:
1. □ 定位:明确违反了哪项具体约束?
   → 列出违规约束及对应的匹配结果

2. □ 追踪:约束应在何处执行?
   → 将约束映射到代码位置(过滤器 | 验证器 | 算法步骤)

3. □ 验证编码:约束是否正确表示?
   → 打印约束数据结构,与需求进行对比

4. □ 验证执行:约束是否在正确时机被检查?
   → 在执行点前后添加日志

5. □ 检查顺序:约束是否在冲突决策前被检查?
   → 追踪决策流程,验证约束先于冲突决策执行

6. □ 检查完整性:是否覆盖了所有实例?
   → 枚举所有应受约束的实体
常见失败模式:
模式症状修复方案
延迟执行中间状态有效,最终状态无效将检查提前
部分覆盖部分实体受约束,其他不受枚举所有场景
软硬混淆为了“更优”匹配违反约束重新归类为硬性约束
数据过期约束基于过时值检查前刷新数据

Phase 3.4: Preference Misalignment Diagnosis

阶段3.4:偏好不一致诊断

When matches are valid but don't reflect intended priorities:
1. □ EXTRACT: Get the actual preference data used
   → Log/print the exact preference structure at match time

2. □ COMPARE: Check against expected preferences
   → Side-by-side diff with business-stated priorities

3. □ TRACE TRANSFORMATION: Follow preference from input to algorithm
   → Log at each transformation step (parsing, normalization, scoring)

4. □ CHECK SCORING: Verify score calculation
   → Manual calculation for 2-3 example cases

5. □ CHECK AGGREGATION: If multi-criteria, verify combination
   → Test each criterion independently, then combined

6. □ CHECK NORMALIZATION: Verify scale/range handling
   → Check for min/max, z-score, or rank normalization bugs
Scoring function checklist:
□ Direction correct (higher = better or lower = better, consistently)
□ Scale appropriate (no single factor dominating)
□ Missing values handled (null → 0? → excluded? → default?)
□ Ties handled explicitly (not left to floating point chance)
□ Edge cases: extreme values, all same values, single candidate
当匹配结果有效但未反映预期优先级时:
1. □ 提取:获取匹配时实际使用的偏好数据
   → 记录/打印匹配时的精确偏好结构

2. □ 对比:与预期偏好进行检查
   → 将实际偏好与业务声明的优先级进行并排对比

3. □ 追踪转换:跟随偏好从输入到算法的全流程
   → 在每个转换步骤(解析、归一化、打分)添加日志

4. □ 检查打分:验证分数计算逻辑
   → 手动计算2-3个示例场景

5. □ 检查聚合:若为多准则,验证组合逻辑
   → 独立测试每个准则,再测试组合效果

6. □ 检查归一化:验证尺度/范围处理逻辑
   → 检查min/max、z-score或排名归一化是否存在bug
打分函数检查清单:
□ 方向正确(始终保持“越高越好”或“越低越好”)
□ 尺度合适(无单一因素主导结果)
□ 缺失值处理正确(null→0?→排除?→默认值?)
□ 平局处理明确(不依赖浮点数随机性)
□ 边缘案例:极端值、所有值相同、单个候选者

Phase 3.5: Hierarchy Traversal Diagnosis

阶段3.5:层级遍历诊断

For multi-level matching issues:
1. □ VISUALIZE: Draw the hierarchy with the problematic match
   → Tree diagram showing all levels and the match path

2. □ CHECK INHERITANCE: Do child constraints inherit from parent?
   → Verify constraint propagation rules

3. □ CHECK AGGREGATION: How do child preferences roll up?
   → Verify aggregation function (sum | max | weighted | majority)

4. □ CHECK LEVEL INTERACTION: Can matches cross levels?
   → Document allowed/forbidden cross-level matches

5. □ CHECK TRAVERSAL ORDER: Top-down or bottom-up?
   → Verify algorithm processes levels in intended order

6. □ CHECK PARTIAL MATCHES: Can a parent match without all children?
   → Document completeness requirements per level
Common hierarchy bugs:
BugSymptomFix
Missing propagationChild ignores parent constraintAdd inheritance logic
Double countingSame entity weighted multiple timesDeduplicate in aggregation
Level skippingMatch at wrong levelAdd level validation
Orphan handlingUnattached children cause errorsDefine orphan policy
针对多层级匹配问题:
1. □ 可视化:绘制包含问题匹配的层级结构
   → 树状图展示所有层级及匹配路径

2. □ 检查继承:子约束是否从父级继承?
   → 验证约束传播规则

3. □ 检查聚合:子偏好如何向上汇总?
   → 验证聚合函数(求和 | 最大值 | 加权 | 多数投票)

4. □ 检查层级交互:匹配是否允许跨层级?
   → 记录允许/禁止跨层级匹配的规则

5. □ 检查遍历顺序:自上而下还是自下而上?
   → 验证算法是否按预期顺序处理层级

6. □ 检查部分匹配:父级是否可在未匹配所有子级的情况下完成匹配?
   → 记录各层级的完整性要求
常见层级bug:
Bug症状修复方案
传播缺失子级忽略父级约束添加继承逻辑
重复计数同一实体被多次加权聚合时去重
层级跳过匹配发生在错误层级添加层级验证
孤立节点处理未关联的子级导致错误定义孤立节点处理策略

Phase 3.6: Performance Diagnosis

阶段3.6:性能诊断

For scale and speed issues:
1. □ PROFILE: Identify the slow component
   → Time each phase: input parsing, preference building, matching, output

2. □ COMPLEXITY CHECK: Verify actual vs expected complexity
   → Log iteration counts, compare to theoretical O(n)

3. □ MEMORY CHECK: Profile memory usage
   → Watch for preference matrix explosion (n² space)

4. □ ALGORITHM FIT: Verify algorithm appropriate for scale
   → See references/algorithms.md for complexity comparison

5. □ CACHING: Check for redundant computation
   → Log cache hits/misses for preference lookups

6. □ BATCH VS STREAMING: Check processing model
   → Full recomputation vs incremental updates

针对规模与速度问题:
1. □ 性能分析:定位缓慢组件
   → 统计各阶段耗时:输入解析、偏好构建、匹配计算、输出

2. □ 复杂度检查:验证实际复杂度与预期是否一致
   → 记录迭代次数,与理论O(n)对比

3. □ 内存检查:分析内存使用情况
   → 警惕偏好矩阵爆炸(n²空间复杂度)

4. □ 算法适配:验证算法是否适配当前规模
   → 参见references/algorithms.md的复杂度对比

5. □ 缓存检查:是否存在冗余计算
   → 记录偏好查询的缓存命中/未命中情况

6. □ 批量 vs 流式:检查处理模型
   → 全量重新计算 vs 增量更新

4. Testing & Validation Procedure

4. 测试与验证流程

4.1: Correctness Properties

4.1 正确性属性

Test these properties for every matching system:
□ DETERMINISM: run(input) = run(input) (10 trials minimum)
□ COMPLETENESS: all entities either matched or explicitly unmatched
□ VALIDITY: all matches satisfy all hard constraints
□ STABILITY (if applicable): no blocking pairs exist
□ OPTIMALITY (if applicable): objective function at expected value
所有匹配系统均需测试以下属性:
□ 确定性:run(input) = run(input)(至少运行10次)
□ 完整性:所有实体要么已匹配,要么显式标记为未匹配
□ 有效性:所有匹配均满足所有硬性约束
□ 稳定性(若适用):不存在阻塞对
□ 最优性(若适用):目标函数达到预期值

4.2: Stability Verification (Two-Sided Matching)

4.2 稳定性验证(双边匹配)

For stable matching, verify no blocking pairs:
python
undefined
对于稳定匹配,需验证不存在阻塞对:
python
undefined

Pseudocode - verify no blocking pair exists

伪代码 - 验证不存在阻塞对

for each unmatched_pair (a, b): if a prefers b over current_match(a): if b prefers a over current_match(b): FAIL: blocking pair found (a, b)
undefined
□ Enumerate all non-matched pairs □ Check mutual preference for each □ Report any blocking pairs found □ For large instances, sample-check (document coverage)
undefined
for each unmatched_pair (a, b): if a prefers b over current_match(a): if b prefers a over current_match(b): FAIL: 发现阻塞对 (a, b)
undefined
□ 枚举所有未配对的实体组合 □ 检查每对组合的相互偏好 □ 报告所有发现的阻塞对 □ 对于大规模实例,采用抽样检查(记录覆盖范围)
undefined

4.3: Constraint Satisfaction Verification

4.3 约束满足验证

□ List all hard constraints
□ For each match, verify against each constraint
□ Generate constraint satisfaction report
□ Flag any violations with specific match and constraint
□ 列出所有硬性约束
□ 针对每个匹配结果,验证其符合所有约束
□ 生成约束满足报告
□ 标记任何违规的匹配结果及对应的约束

4.4: Edge Case Test Suite

4.4 边缘案例测试套件

Mandatory test cases:
□ Empty input (no entities on one or both sides)
□ Single entity (one-to-one degenerate case)
□ All identical preferences (maximum tie scenario)
□ Mutually exclusive preferences (everyone wants same thing)
□ Impossible constraints (unsatisfiable, should error clearly)
□ Maximum capacity (all slots exactly filled)
□ Minimum capacity (barely enough slots)
□ Self-referential (can entity match itself? test boundary)
□ Circular preferences (A→B→C→A)
必选测试用例:
□ 空输入(一方或双方无实体)
□ 单个实体(一对一退化场景)
□ 所有偏好相同(最大平局场景)
□ 互斥偏好(所有人都想要同一个对象)
□ 不可能约束(无法满足,应清晰报错)
□ 最大容量(所有名额刚好填满)
□ 最小容量(仅够满足极少需求)
□ 自引用(实体能否与自身匹配?测试边界)
□ 循环偏好(A→B→C→A)

4.5: Regression Test Maintenance

4.5 回归测试维护

□ Capture real production cases that revealed bugs
□ Minimize to smallest reproducing example
□ Document expected behavior explicitly
□ Run on every change to matching logic

□ 捕获暴露bug的真实生产场景
□ 简化为最小复现示例
□ 明确记录预期行为
□ 每次修改匹配逻辑时运行回归测试

5. Review Checklist

5. 评审检查清单

When reviewing matching system code or design:
评审匹配系统代码或设计时:

5.1: Design Review

5.1 设计评审

□ Problem correctly classified (Section 1)
□ Algorithm appropriate for problem class (references/decision-guide.md)
□ All business requirements mapped to constraints (Section 2.1)
□ Hard vs soft constraints clearly distinguished
□ Tie-breaking is deterministic and documented
□ Hierarchy semantics defined (if applicable)
□ 问题分类正确(第1节)
□ 算法适配问题类别(参见references/decision-guide.md)
□ 所有业务需求均映射为约束(第2.1节)
□ 硬性与软性约束区分明确
□ 平局处理规则确定且已记录
□ 层级语义已定义(若适用)

5.2: Implementation Review

5.2 实现评审

□ Preference representation matches algorithm requirements
□ Constraints enforced at correct point in algorithm
□ No hidden randomness (unseeded RNG, hash iteration)
□ Floating point comparison handled correctly
□ Edge cases handled (empty, single, ties)
□ Error messages identify specific constraint violations
□ 偏好表示方式符合算法要求
□ 约束在算法的正确阶段执行
□ 无隐藏随机性(未设种子的RNG、哈希迭代)
□ 浮点数比较处理正确
□ 边缘案例已处理(空输入、单个实体、平局)
□ 错误消息明确指出具体约束违规

5.3: Testing Review

5.3 测试评审

□ All properties from 4.1 tested
□ Edge cases from 4.4 covered
□ Performance benchmarked at realistic scale
□ Regression tests exist for past bugs

□ 已测试4.1节中的所有属性
□ 已覆盖4.4节中的边缘案例
□ 已在真实规模下进行性能基准测试
□ 已为过往bug建立回归测试

Appendix: Common Anti-Patterns

附录:常见反模式

Anti-PatternProblemSolution
Greedy first-comeOrder-dependent, non-optimalUse proper algorithm
Score = sum(all factors)One factor dominatesNormalize scales
Retry until validNon-deterministic, slowFix constraint order
Global preference cacheStale across updatesInvalidate on change
String matching for entitiesCase/whitespace bugsUse canonical IDs
Float equality for tiesNon-deterministicUse epsilon or integer
Recursive hierarchy walkStack overflow riskUse iterative with explicit stack
N² preference matrixMemory explosionUse sparse representation
反模式问题解决方案
贪心先到先得依赖输入顺序,非最优使用标准算法
分数=所有因素之和单一因素主导结果归一化尺度
重试直到有效非确定性、缓慢修复约束执行顺序
全局偏好缓存更新后数据过期变更时失效缓存
实体字符串匹配大小写/空格bug使用标准ID
浮点数判断平局非确定性使用epsilon或整数
递归层级遍历栈溢出风险使用迭代式显式栈
N²偏好矩阵内存爆炸使用稀疏表示
",