manage-tech-debt

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Manage Tech Debt

管理技术债务

Overview

概述

Track, categorize, and prioritize technical debt using a structured register. This is an ongoing management skill, not a one-time audit. It uses Martin Fowler's tech debt quadrant to classify debt, assigns an "interest rate" (how much the debt slows us down per sprint), estimates payoff effort, and prioritizes by impact-to-effort ratio aligned with upcoming work.
使用结构化登记册对技术债务进行跟踪、分类和优先级排序。这是一项持续性的管理工作,而非一次性审计。它采用Martin Fowler的技术债务四象限对债务进行分类,分配“利率”(即该债务每个迭代会拖慢我们多少进度),估算偿还所需工作量,并根据影响-工作量比结合后续工作进行优先级排序。

Workflow

工作流程

  1. Read existing register -- Check for an existing tech debt register at
    .chalk/docs/engineering/*_tech_debt_register.md
    . If one exists, read it and use it as the starting point. The register is a living document -- update it, do not create duplicates.
  2. Read project context -- Read
    .chalk/docs/engineering/
    for:
    • Architecture docs to understand system boundaries
    • ADRs and RFCs to understand intentional design choices (not all suboptimal code is debt)
    • Prior audit reports that may have flagged tech debt
    • Upcoming roadmap or sprint plans to identify strategic alignment opportunities
  3. Determine the operation -- Based on
    $ARGUMENTS
    and conversation context:
    • Add: Add new debt items to the register
    • Assess: Scan the codebase for a specific area and identify debt
    • Prioritize: Re-rank existing items based on current context
    • Review: Show the current register with updated priorities
    • Retire: Mark items as resolved and document the resolution
  4. For new debt items -- Classify using the Tech Debt Quadrant:
    DeliberateInadvertent
    Reckless"We don't have time for tests""What are integration tests?"
    Prudent"We'll ship now and refactor before scaling""Now we know how this should have been built"
    • Reckless/Deliberate: Knowingly cut corners with no plan to address it. Highest urgency.
    • Reckless/Inadvertent: Did not know better at the time. Needs education + fix.
    • Prudent/Deliberate: Intentional tradeoff with a plan. Track and execute the plan.
    • Prudent/Inadvertent: Learned a better approach after building. Normal and healthy.
  5. For each debt item, capture:
    • Category: Architecture, Code Quality, Testing, Infrastructure, Dependencies, Documentation, Security
    • Quadrant: Which cell in the tech debt quadrant
    • Description: Concrete description of what the debt is
    • Location: Files, modules, or systems affected
    • Impact (Interest Rate): How much this slows the team down per sprint, quantified:
      • Hours of extra work per sprint caused by this debt
      • Number of bugs per quarter attributable to this debt
      • Developer experience friction (onboarding time, confusion, workarounds)
    • Payoff Effort: Estimated effort to resolve (T-shirt size + approximate hours/days)
    • Business Justification: Why fixing this matters in business terms, not just engineering terms
    • Strategic Alignment: Does fixing this unblock or de-risk upcoming planned work?
  6. Prioritize the register -- Rank items by:
    • Impact/Effort ratio: High impact, low effort items go first
    • Strategic alignment: Debt that blocks upcoming planned work gets a priority boost
    • Coupling to upcoming work: If you are already changing the affected area, fix the debt now (lowest marginal cost)
    • Risk: Debt that could cause incidents or data loss gets a priority boost regardless of effort
  7. Write or update the register -- Save to
    .chalk/docs/engineering/tech_debt_register.md
    . If updating an existing register, preserve history (do not delete resolved items; mark them as resolved with the date).
  8. Summarize -- Tell the user what was added/changed, the current top 5 priorities, and any items that align with upcoming work.
  1. 读取现有登记册 -- 检查
    .chalk/docs/engineering/*_tech_debt_register.md
    路径下是否存在现有技术债务登记册。若存在,则读取该登记册并将其作为起点。登记册是一份动态文档——需对其进行更新,而非创建副本。
  2. 读取项目上下文 -- 读取
    .chalk/docs/engineering/
    路径下的以下内容:
    • 架构文档,以了解系统边界
    • ADR和RFC文档,以了解经过深思熟虑的设计决策(并非所有非最优代码都是债务)
    • 之前的审计报告,其中可能标记了技术债务
    • 后续路线图或迭代计划,以识别战略对齐机会
  3. 确定操作类型 -- 根据
    $ARGUMENTS
    和对话上下文:
    • 添加:向登记册中添加新的债务条目
    • 评估:扫描代码库的特定区域,识别债务
    • 优先级排序:根据当前上下文重新排序现有条目
    • 审核:展示更新优先级后的当前登记册
    • 标记已解决:将条目标记为已解决并记录解决方案
  4. 对于新债务条目——使用技术债务四象限进行分类
    有意为之无意造成
    鲁莽型“我们没时间写测试”“什么是集成测试?”
    谨慎型“我们先发布,在扩容前再重构”“现在我们知道应该怎么构建这个了”
    • 鲁莽/有意为之:明知偷工减料且无解决计划,优先级最高。
    • 鲁莽/无意造成:当时缺乏相关知识,需要培训+修复。
    • 谨慎/有意为之:有意做出的权衡且有解决计划,需跟踪并执行计划。
    • 谨慎/无意造成:构建后学到了更好的方法,属于正常且健康的情况。
  5. 为每个债务条目记录以下信息
    • 类别:架构、代码质量、测试、基础设施、依赖项、文档、安全
    • 四象限分类:属于技术债务四象限中的哪一格
    • 描述:债务的具体说明
    • 位置:受影响的文件、模块或系统
    • 影响(利率):该债务每个迭代会拖慢团队多少进度,量化为:
      • 每个迭代因该债务产生的额外工作小时数
      • 每季度因该债务导致的Bug数量
      • 开发者体验摩擦(入职时间、困惑、临时解决方案)
    • 偿还工作量:解决该债务所需的估算工作量(T恤尺码+大致小时/天数)
    • 业务合理性:从业务角度说明为何需要修复该债务,而非仅从工程角度
    • 战略对齐:修复该债务是否能为后续计划工作扫清障碍或降低风险?
  6. 对登记册进行优先级排序 -- 按以下标准排序条目:
    • 影响/工作量比:高影响、低工作量的条目优先
    • 战略对齐:阻碍后续计划工作的债务获得优先级提升
    • 与后续工作的耦合度:如果已经要修改受影响的区域,现在就修复债务(边际成本最低)
    • 风险:可能导致事故或数据丢失的债务,无论工作量多少都需提升优先级
  7. 编写或更新登记册 -- 保存至
    .chalk/docs/engineering/tech_debt_register.md
    。若更新现有登记册,需保留历史记录(不要删除已解决的条目;标记为已解决并注明日期)。
  8. 总结 -- 告知用户新增/修改的内容、当前前5个优先级最高的条目,以及任何与后续工作对齐的条目。

Filename Convention

文件名约定

tech_debt_register.md
A project should have exactly one tech debt register at a fixed path (e.g.,
.chalk/docs/engineering/tech_debt_register.md
). If one already exists, update it instead of creating a new one.
tech_debt_register.md
一个项目应在固定路径下仅有一份技术债务登记册(例如
.chalk/docs/engineering/tech_debt_register.md
)。若已存在,应更新它而非创建新的登记册。

Tech Debt Register Format

技术债务登记册格式

markdown
undefined
markdown
undefined

Tech Debt Register

Tech Debt Register

Last updated: <YYYY-MM-DD> Total items: <active count> active, <resolved count> resolved
Last updated: <YYYY-MM-DD> Total items: <active count> active, <resolved count> resolved

Summary

Summary

By Category

By Category

CategoryCountTotal Interest (hrs/sprint)
Architecture38
Code Quality56
Testing24
Dependencies12
CategoryCountTotal Interest (hrs/sprint)
Architecture38
Code Quality56
Testing24
Dependencies12

Top 5 Priorities

Top 5 Priorities

#ItemInterest RateEffortRatioAligned With
1Payment retry logic4 hrs/sprint2 daysHighQ2 billing overhaul
2Shared validation3 hrs/sprint1 dayHighAPI v2 migration
3Test database setup2 hrs/sprint3 daysMedium
4Legacy auth module3 hrs/sprint5 daysMediumAuth service RFC
5Missing indexes2 hrs/sprint0.5 dayHigh
#ItemInterest RateEffortRatioAligned With
1Payment retry logic4 hrs/sprint2 daysHighQ2 billing overhaul
2Shared validation3 hrs/sprint1 dayHighAPI v2 migration
3Test database setup2 hrs/sprint3 daysMedium
4Legacy auth module3 hrs/sprint5 daysMediumAuth service RFC
5Missing indexes2 hrs/sprint0.5 dayHigh

Active Debt Items

Active Debt Items

TD-001: <Title>

TD-001: <Title>

  • Category: Architecture
  • Quadrant: Prudent / Deliberate
  • Added: <YYYY-MM-DD>
  • Location:
    src/payments/retry.ts
    ,
    src/payments/processor.ts
  • Description: Payment retry logic uses a simple loop with fixed delays instead of exponential backoff with jitter. This was acceptable at low volume but causes thundering herd problems under load.
  • Impact (Interest Rate): ~4 hours/sprint investigating timeout-related payment failures. 2-3 support tickets per week from merchants about failed retries.
  • Payoff Effort: M (2 days) -- replace retry loop with a proper backoff library, add circuit breaker, update tests.
  • Business Justification: Payment reliability directly affects merchant trust and revenue. Each failed retry costs an average of $47 in lost transaction value.
  • Strategic Alignment: Directly relevant to Q2 billing overhaul. Fixing now reduces risk for that project.
  • Priority: 1 (High impact/effort ratio + strategic alignment)
  • Category: Architecture
  • Quadrant: Prudent / Deliberate
  • Added: <YYYY-MM-DD>
  • Location:
    src/payments/retry.ts
    ,
    src/payments/processor.ts
  • Description: Payment retry logic uses a simple loop with fixed delays instead of exponential backoff with jitter. This was acceptable at low volume but causes thundering herd problems under load.
  • Impact (Interest Rate): ~4 hours/sprint investigating timeout-related payment failures. 2-3 support tickets per week from merchants about failed retries.
  • Payoff Effort: M (2 days) -- replace retry loop with a proper backoff library, add circuit breaker, update tests.
  • Business Justification: Payment reliability directly affects merchant trust and revenue. Each failed retry costs an average of $47 in lost transaction value.
  • Strategic Alignment: Directly relevant to Q2 billing overhaul. Fixing now reduces risk for that project.
  • Priority: 1 (High impact/effort ratio + strategic alignment)

TD-002: <Title>

TD-002: <Title>

...
...

Resolved Debt Items

Resolved Debt Items

TD-008: <Title> [RESOLVED <YYYY-MM-DD>]

TD-008: <Title> [RESOLVED <YYYY-MM-DD>]

  • Resolution: Refactored in PR #234. Replaced manual SQL with query builder.
  • Actual effort: 1.5 days (estimated: 2 days)
  • Outcome: Eliminated 2 hrs/sprint of debugging SQL-related issues.
undefined
  • Resolution: Refactored in PR #234. Replaced manual SQL with query builder.
  • Actual effort: 1.5 days (estimated: 2 days)
  • Outcome: Eliminated 2 hrs/sprint of debugging SQL-related issues.
undefined

Interest Rate Guidelines

利率指南

The "interest rate" is the ongoing cost of not fixing the debt. Quantify it as concretely as possible:
Interest LevelHours/SprintCharacteristics
Critical8+ hrsCauses incidents, blocks features, developers actively work around it every sprint
High4-8 hrsRegular source of bugs, significant developer friction, slows multiple features
Medium2-4 hrsOccasional bugs, noticeable friction, slows some features
Low0.5-2 hrsMinor annoyance, rarely causes issues but adds up over time
Negligible<0.5 hrsAesthetic concern, not worth prioritizing unless zero-cost to fix
If you cannot estimate the interest rate, the debt is not well-understood enough to prioritize. Investigate further before adding it to the register.
“利率”是不修复债务的持续成本。尽可能具体地量化:
Interest LevelHours/SprintCharacteristics
Critical8+ hrsCauses incidents, blocks features, developers actively work around it every sprint
High4-8 hrsRegular source of bugs, significant developer friction, slows multiple features
Medium2-4 hrsOccasional bugs, noticeable friction, slows some features
Low0.5-2 hrsMinor annoyance, rarely causes issues but adds up over time
Negligible<0.5 hrsAesthetic concern, not worth prioritizing unless zero-cost to fix
如果无法估算利率,说明对该债务的了解还不足以进行管理。在添加到登记册前需进一步调查并量化。“感觉很慢”不是可衡量的指标。

Effort Estimation

工作量估算

SizeDurationCharacteristics
XS< 2 hoursSingle file change, localized, no risk
S0.5-1 dayFew files, well-understood, low risk
M1-3 daysMultiple files, needs testing, moderate risk
L3-5 daysCross-module changes, needs migration, high risk
XL1-2 weeksArchitectural change, needs RFC or ADR, phased rollout
SizeDurationCharacteristics
XS< 2 hoursSingle file change, localized, no risk
S0.5-1 dayFew files, well-understood, low risk
M1-3 daysMultiple files, needs testing, moderate risk
L3-5 daysCross-module changes, needs migration, high risk
XL1-2 weeksArchitectural change, needs RFC or ADR, phased rollout

When to Address Debt

何时处理债务

Fix Now (do not add to register)

立即修复(无需添加到登记册)

  • Security vulnerabilities
  • Data corruption risks
  • Issues causing customer-visible incidents
  • 安全漏洞
  • 数据损坏风险
  • 导致客户可见事故的问题

Fix When Touching the Area

修改相关区域时修复

  • Debt with medium interest rate in code you are already changing
  • Test gaps for code you are modifying
  • Stale documentation for features you are updating
  • 代码质量中等的债务,且正在修改该代码
  • 正在修改的代码存在测试缺口
  • 正在更新的功能对应的文档已过时

Schedule Explicitly

明确安排时间修复

  • High interest rate items blocking planned work
  • Items with high impact/effort ratio
  • Debt that will get more expensive to fix over time (coupling is increasing)
  • 高利率且阻碍计划工作的条目
  • 高影响/工作量比的条目
  • 修复成本会随时间增加的债务(耦合度不断提高)

Accept and Document

接受并记录

  • Low interest rate in stable code that rarely changes
  • Prudent/deliberate debt where the planned payoff timeline has not arrived
  • Debt where the fix effort exceeds the projected lifetime cost
  • 低利率且稳定、很少修改的代码中的债务
  • 谨慎/有意为之的债务,且计划的偿还时间尚未到来
  • 修复工作量超过预期生命周期成本的债务

Codebase Assessment Checklist

代码库评估检查表

When asked to assess a specific area for tech debt, check:
AreaWhat to Look For
ArchitectureCircular dependencies, god classes/modules, missing abstraction layers, tight coupling
Code QualityCode duplication (>3 instances), long functions (>50 lines), deep nesting (>3 levels), magic numbers
TestingMissing tests for critical paths, brittle tests, slow test suite, no integration tests
DependenciesOutdated packages (>2 major versions behind), packages with known CVEs, abandoned packages
InfrastructureManual deployment steps, missing monitoring, no alerting, single points of failure
DocumentationOutdated architecture docs, missing API docs, no onboarding guide, stale comments
SecurityHardcoded secrets, missing input validation, outdated auth patterns, no rate limiting
当要求评估特定区域的技术债务时,检查以下内容:
AreaWhat to Look For
ArchitectureCircular dependencies, god classes/modules, missing abstraction layers, tight coupling
Code QualityCode duplication (>3 instances), long functions (>50 lines), deep nesting (>3 levels), magic numbers
TestingMissing tests for critical paths, brittle tests, slow test suite, no integration tests
DependenciesOutdated packages (>2 major versions behind), packages with known CVEs, abandoned packages
InfrastructureManual deployment steps, missing monitoring, no alerting, single points of failure
DocumentationOutdated architecture docs, missing API docs, no onboarding guide, stale comments
SecurityHardcoded secrets, missing input validation, outdated auth patterns, no rate limiting

Anti-patterns

反模式

  • Infinite list with no prioritization -- A tech debt register with 50 items and no ranking is a graveyard, not a management tool. Every item must have an interest rate and effort estimate. Rank by impact/effort ratio. If the list exceeds 20 active items, the bottom items should be evaluated for removal.
  • No business justification -- "This code is ugly" is not a business justification. "This code causes 3 hours of debugging per sprint and has led to 2 production incidents in the last quarter" is. Every debt item must justify its existence in terms a product manager would understand.
  • "Refactor everything" -- Not all old code is debt. Stable code that works, is tested, and rarely needs changes is not debt even if it uses old patterns. Debt is code that actively costs you ongoing effort. Do not confuse "not how I would write it today" with "tech debt."
  • Debt without estimated interest rate -- If you cannot estimate how much a debt item costs per sprint, you do not understand it well enough to manage it. Investigate and quantify before adding it to the register. "It feels slow" is not a measurement.
  • One-time cleanup events -- "Tech debt sprint" or "cleanup week" treats debt as a batch problem. Debt is continuous -- address the highest-impact items every sprint as part of regular work. Budget 15-20% of sprint capacity for debt reduction.
  • Ignoring strategic alignment -- A medium-priority debt item that blocks an upcoming Q3 feature should be fixed now, not after Q3 launches. Always cross-reference the debt register with the product roadmap.
  • Never resolving items -- If the register only grows and never shrinks, it loses credibility. Track resolutions, celebrate them, and measure the actual payoff vs. estimated payoff. This builds trust in the system.
  • Adding debt without a decision -- Every new debt item should have an initial decision: fix now, fix when touching, schedule, or accept. Items without a decision sit in limbo and clutter the register.
  • 无优先级的无限列表 -- 包含50个条目但未排序的技术债务登记册是“墓地”,而非管理工具。每个条目都必须有利率和工作量估算。按影响/工作量比排序。如果活跃条目超过20个,应评估底部条目是否需要移除。
  • 无业务合理性 -- “这段代码很丑”不是业务合理性。“这段代码每个迭代导致3小时的调试工作,且过去一个季度已引发2次生产事故”才是。每个债务条目都必须从产品经理能理解的角度说明其存在的合理性。
  • “重构所有内容” -- 并非所有旧代码都是债务。稳定、可用、经过测试且很少需要修改的代码,即使使用旧模式也不是债务。债务是会持续产生成本的代码。不要将“不是我现在会写的风格”与“技术债务”混淆。
  • 无估算利率的债务 -- 如果无法估算每个迭代因该债务产生的成本,说明对其了解还不足以进行管理。在添加到登记册前需调查并量化。“感觉很慢”不是衡量标准。
  • 一次性清理活动 -- “技术债务迭代”或“清理周”将债务视为批量问题。债务是持续存在的——每个迭代都应在常规工作中处理影响最大的条目。为债务修复分配15-20%的迭代产能。
  • 忽略战略对齐 -- 中等优先级但阻碍即将到来的Q3功能的债务应立即修复,而非等到Q3发布后。始终将债务登记册与产品路线图交叉参考。
  • 从未标记条目为已解决** -- 如果登记册只增长不缩小,就会失去可信度。跟踪解决方案,庆祝成果,并衡量实际收益与估算收益。这会建立对该系统的信任。
  • 未做决策就添加债务 -- 每个新债务条目都应有初始决策:立即修复、修改相关区域时修复、安排时间修复或接受。未做决策的条目会处于悬而未决的状态,使登记册杂乱无章。