capacity-planner

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

capacity-planner

Sizing tool for ops teams that handle queued work — Support, CX, Customer Success, BizOps, IT ops, Finance ops. Built on Erlang-C queueing theory, Little's Law, and the operational-leadership canon (Fournier, Larson, Cleveland, Reinertsen). Deterministic, stdlib-only, no LLM calls.

专为处理排队类工作的运营团队打造的规模测算工具——适用于支持团队、客户体验（CX）团队、客户成功团队、业务运营团队、IT运营团队、财务运营团队。基于Erlang-C排队理论、Little定理和运营领导力权威理论（Fournier、Larson、Cleveland、Reinertsen）构建。采用确定性算法，仅依赖标准库，无需调用大语言模型（LLM）。

Purpose

用途

You are an ops leader sized 15 → 35 with no idea how the 35-person org will actually behave at peak load. Or you are at 88% utilization and SLA is starting to slip. Or you have a hiring budget approved and need to sequence it across four quarters without burning out the existing team. This skill answers those questions with arithmetic, not vibes.

It produces three artifacts:

Capacity sizing at 70/80/90% utilization against P50/P90/P99 demand, with P(SLA breach) at each point and a SAFE/WATCH/AT_RISK/CRITICAL risk band.
Utilization health at the per-member traffic-light level plus a team verdict (HEALTHY/SQUEEZED/OVERLOADED/UNBALANCED).
12-month quarterly hiring plan accounting for ramp curves, attrition, QoQ demand growth, and span-of-control manager triggers.

如果你是管理15至35人规模的运营负责人，不确定35人团队在峰值负载下的实际表现；或者你的团队利用率已达88%，服务水平协议（SLA）开始出现偏差；或者你已获得招聘预算，需要在四个季度内合理安排招聘节奏，避免现有团队过度劳累——本工具将通过精准计算为你解答这些问题，而非仅凭主观判断。

本工具可生成三类成果：

容量测算：基于70%/80%/90%的利用率，结合P50/P90/P99需求规模，计算各节点的SLA违约概率，并划分SAFE/WATCH/AT_RISK/CRITICAL风险等级。
利用率健康度评估：为团队成员提供红黄绿三色利用率健康标识，同时给出团队整体健康结论（HEALTHY/SQUEEZED/OVERLOADED/UNBALANCED）。
12个月季度招聘计划：考虑人员成长周期、流失率、季度需求增长以及管理幅度触发条件。

When to use

使用场景

Annual ops capacity planning (October-November for the following fiscal year).
Quarterly re-sizing if demand changed >15% or attrition spiked.
Pre-budget defense — the math that justifies the headcount ask to your CFO.
Diagnostic when an ops team is missing SLA and you need to know whether it's a sizing problem, a process problem, or a bottleneck problem.
M&A / new-segment launch modeling — sizing a new team or combined org.

年度运营容量规划（财年次年的10-11月）。
季度重新测算：当需求变化超过15%或人员流失率激增时。
预算审批前的论证：向首席财务官（CFO）证明人员编制需求的数学依据。
问题诊断：当运营团队无法达成SLA时，判断问题出在规模配置、流程还是瓶颈环节。
并购/新业务板块启动建模：为新团队或合并后的组织测算规模。

Workflow

工作流程

Intake demand. Pull P50/P90/P99 daily ticket/case volume from your work system (Zendesk, Intercom, JSM, ServiceNow, Salesforce). If you only have averages, stop and pull the distribution. Single- point demand estimates are the most expensive anti-pattern in ops.
Model throughput. Run
```
capacity_modeler.py
```
with your demand, AHT, SLA target, current FTE, and shrinkage. Use
```
--profile
```
for your function (support / cx / bizops / finance-ops / it-ops). Read the 80%-utilization row — that's your sizing point.
Flag utilization risk. Run
```
utilization_analyzer.py
```
against your current team's actual utilization data. Anyone >85% sustained is a throughput-collapse risk per Reinertsen. Spread >30 percentage points across team means UNBALANCED — fix that before hiring.
Sequence hiring. Run
```
hiring_sequencer.py
```
with current FTE, target EOY, ramp time, attrition, and growth. It will front-load hires (Q1 35%, Q4 15%), apply ramp curves, and trigger a manager hire when span of control crosses 7 ICs/manager.
Walk the Forcing-question library (see below). One question at a time. Do not skip ahead. Answers must be written down before you commit the plan.

收集需求数据：从你的工作系统（Zendesk、Intercom、JSM、ServiceNow、Salesforce）提取P50/P90/P99的每日工单/案例量。如果只有平均值，请停止操作并提取分布数据——单点需求估算是运营领域代价最高的错误做法。
建模吞吐量：运行
```
capacity_modeler.py
```
脚本，输入你的需求数据、平均处理时长（AHT）、SLA目标、当前全职等效人员（FTE）和人员缩减率。使用
```
--profile
```
参数指定团队职能（support / cx / bizops / finance-ops / it-ops）。查看80%利用率对应的行——这就是你的规模配置参考点。
标记利用率风险：运行
```
utilization_analyzer.py
```
脚本，分析当前团队的实际利用率数据。根据Reinertsen的理论，持续利用率超过85%的成员存在吞吐量崩溃风险。团队内利用率差异超过30个百分点意味着团队配置失衡——请在招聘前解决此问题。
规划招聘节奏：运行
```
hiring_sequencer.py
```
脚本，输入当前FTE、年末目标FTE、人员成长周期、流失率和需求增长率。脚本会前置招聘安排（第一季度35%，第四季度15%），应用成长周期曲线，并在管理幅度超过7名一线员工/经理时触发招聘经理的提示。
逐一回答强制问题库中的问题（见下文）。请逐个回答，不要跳过。在确定计划前，必须写下所有答案。

Scripts

脚本说明

```
scripts/capacity_modeler.py
```
— Erlang-C sizing with shrinkage adjustment and P50/P90/P99 breach probabilities.
```
--profile
```
for industry defaults.
```
scripts/utilization_analyzer.py
```
— per-member traffic-light + team-level health verdict with variance detection.
```
scripts/hiring_sequencer.py
```
— 12-month quarterly plan with ramp, attrition, growth, max-hires-per-quarter constraint, and manager-trigger logic.

All three accept

--input <path>

(JSON),

--output {markdown,json}

--sample

(built-in example), and

--help

. Stdlib only.

```
scripts/capacity_modeler.py
```
—— 基于Erlang-C算法的规模测算工具，支持人员缩减率调整，可计算P50/P90/P99需求下的违约概率。使用
```
--profile
```
参数获取行业默认值。
```
scripts/utilization_analyzer.py
```
—— 为团队成员提供红黄绿三色健康标识，并结合方差检测给出团队整体健康结论。
```
scripts/hiring_sequencer.py
```
—— 生成12个月季度招聘计划，涵盖人员成长周期、流失率、需求增长、单季度最大招聘人数限制以及经理触发逻辑。

三个脚本均支持

--input <path>

（JSON格式输入）、

--output {markdown,json}

（输出格式）、

--sample

（内置示例）和

--help

（帮助信息）参数。仅依赖Python标准库。

References

参考资料

```
references/queueing_theory_canon.md
```
— Erlang, Little, Hopp & Spearman, Reinertsen, Kingman, Cleveland, ITIL, Armony et al. (8 sources). The math.
```
references/ops_workforce_planning_canon.md
```
— Fournier, Larson, Google SRE Workbook, Frei, Lawler, Bersin, Gartner, Grove (8 sources). The people factors.
```
references/capacity_anti_patterns.md
```
— 11 named anti-patterns with cited sources, tool guards, and the meta-discipline that Lencioni + Goldratt + Christensen impose. (8+ named sources.)

```
references/queueing_theory_canon.md
```
—— 包含Erlang、Little、Hopp & Spearman、Reinertsen、Kingman、Cleveland、ITIL、Armony等人的8份资料，聚焦算法原理。
```
references/ops_workforce_planning_canon.md
```
—— 包含Fournier、Larson、Google SRE工作手册、Frei、Lawler、Bersin、Gartner、Grove的8份资料，聚焦人力因素。
```
references/capacity_anti_patterns.md
```
—— 列出11种已命名的错误做法，附带资料引用、工具防护措施，以及Lencioni、Goldratt、Christensen提出的元学科理论（8+份命名资料）。

Assets

配套资源

```
assets/capacity_brief_template.md
```
— 20-minute fill-out template with JSON skeletons for all three tools and an output checklist.

```
assets/capacity_brief_template.md
```
—— 20分钟填写模板，包含三个工具的JSON骨架和输出检查清单。

Assumptions

假设条件

This skill assumes:

Work is queued (tickets, cases, work items) — not project-style. If your team's work isn't queued, this is the wrong skill.
Demand has a stationary-enough distribution within a quarter. Step-changes (new product launch, M&A, regulatory shift) require re-running mid-quarter.
You have at least 90 days of historical demand data to compute P50/P90/P99. If not, generate the distribution from your sales / user-base forecast first.
Service is single-class within a queue. If you have hard priority tiers (P1/P2/P3 with class-specific SLAs), model each as a separate queue and sum.
Channels are modeled coherently. Multi-channel teams use the appropriate
```
--profile
```
with built-in shrinkage premium.

本工具基于以下假设：

工作为排队类（工单、案例、工作项）——而非项目式工作。如果你的团队工作不属于排队类，本工具不适用。
需求在一个季度内具有足够稳定的分布。若出现阶跃变化（如新产品发布、并购、监管政策变动），需在季度中期重新运行工具。
你拥有至少90天的历史需求数据用于计算P50/P90/P99。若没有，请先根据销售/用户基数预测生成需求分布。
同一队列内的服务为单一类别。如果存在严格的优先级层级（P1/P2/P3，且各层级有特定SLA），请将每个层级作为单独队列建模，再汇总结果。
多渠道工作需统一建模。多渠道团队需使用对应的
```
--profile
```
参数，该参数内置了人员缩减率溢价。

Anti-patterns

错误做法

See

references/capacity_anti_patterns.md

for the full taxonomy with sources. Top eight:

Plan-to-100%-utilization (Reinertsen Principle 12)
Treat-ramp-as-instant (Larson)
Ignore-attrition-in-12-month-plan (Bersin)
Hire-ICs-forever-with-no-manager-trigger (Fournier)
Size-to-P50-demand-only (Cleveland)
No-shrinkage-adjustment (Cleveland, SRE Workbook)
Single-channel-model-for-multi-channel-work (Gartner, Kingman)
No-surge-plan-for-P99-events (Hopp & Spearman, Reinertsen)

完整的错误做法分类及资料引用请查看

references/capacity_anti_patterns.md

。排名前8的错误做法：

按100%利用率规划（Reinertsen第12原则）
假设人员成长周期为零（Larson）
12个月计划中忽略人员流失率（Bersin）
持续招聘一线员工，未设置经理招聘触发条件（Fournier）
仅按P50需求规模配置（Cleveland）
未考虑人员缩减率调整（Cleveland、SRE工作手册）
用单渠道模型处理多渠道工作（Gartner、Kingman）
未针对P99峰值事件制定应急计划（Hopp & Spearman、Reinertsen）

Distinct from

与其他工具的区别

c-level-advisor/vpe-advisor
measures engineering throughput via DORA 4 metrics, story points, deployment frequency, and cycle time bottlenecks. It is for engineering teams shipping code. This skill is for ops teams handling tickets/cases. Different unit of work, different math (Erlang-C vs. DORA), different bottleneck (queueing-blind staffing vs. WIP + lead time).
c-level-advisor/chro-advisor
does strategic workforce planning (1-5 year capability portfolios, talent supply, leadership succession). This skill does operational 0-12 month capacity sizing against demand. Per Lawler: conflating them gets you hired into the wrong jobs.
project-management/*
tracks delivery throughput on projects (Jira velocity, sprint capacity). This skill sizes around steady- state queued work.
Sibling
process-mapper
finds the bottleneck. This skill sizes the team around a known bottleneck. Order of operations: process-mapper first → capacity-planner second. Hiring around the wrong constraint wastes the hires.
business-growth/cs-coverage
(if it exists) sizes Customer Success coverage by ARR/CSM ratio and segment. This skill sizes by queued work volume (tickets, cases, escalations). For a CS team that handles both relationship work AND a ticket queue, run both.

c-level-advisor/vpe-advisor
：通过DORA四项指标、故事点、部署频率和周期时间衡量工程团队的吞吐量，适用于交付代码的工程团队。本工具适用于处理工单/案例的运营团队。两者的工作单元、算法（Erlang-C vs DORA）、瓶颈点（无排队意识的人员配置 vs 在制品（WIP）+前置时间）均不同。
c-level-advisor/chro-advisor
：负责战略性人力规划（1-5年能力组合、人才供给、领导力继任）。本工具负责运营性的0-12个月容量测算，以需求为依据。根据Lawler的理论，混淆两者会导致招聘错误的岗位。
project-management/*
：跟踪项目的交付吞吐量（Jira速度、冲刺容量）。本工具针对稳态排队类工作进行规模配置。
同类工具
```
process-mapper
```
：定位瓶颈。本工具围绕已知瓶颈配置团队规模。操作顺序：先运行process-mapper，再运行本工具。围绕错误的约束条件招聘只会浪费招聘资源。
business-growth/cs-coverage
（若存在）：按ARR/CSM比率和业务板块测算客户成功团队的覆盖范围。本工具按排队类工作数量（工单、案例、升级请求）测算规模。对于同时处理关系维护和工单队列的客户成功团队，需同时运行两个工具。

Forcing-question library (Matt Pocock grill discipline)

强制问题库（Matt Pocock严格审查原则）

Discipline: walk these one at a time. Do not skip ahead. Answers must be written down. If you can't answer one, that is your next investigation.

原则：请逐一回答，不要跳过。必须写下答案。如果无法回答某个问题，这就是你接下来需要调研的内容。

Q1 — "What is your bottleneck, and have you confirmed it empirically?"

Q1 — “你的瓶颈是什么，是否已通过实证确认？”

Recommended answer: a named, measured stage in the workflow with queue-time data showing where work waits. Not a vibe. Not "escalations take too long". An actual measured queue.

Why it's the first question: Goldratt (The Goal, 1984) — every system has exactly one binding constraint at a time. Sizing around the wrong constraint wastes hires entirely. If you do not know your bottleneck, run

process-mapper

BEFORE this skill.

Canon: Eli Goldratt, The Goal (1984); Reinertsen, Principles of Product Development Flow (2009).

推荐答案：工作流中一个已命名、可衡量的阶段，且有队列时间数据显示工作等待的位置。不能是主观感受，不能是“升级请求处理时间太长”，必须是实际测量的队列。 为什么是第一个问题：Goldratt（《目标》，1984）——每个系统在同一时间恰好存在一个约束条件。围绕错误的约束条件配置规模会完全浪费招聘资源。如果你不知道瓶颈所在，请先运行

process-mapper

工具，再使用本工具。 参考资料：Eli Goldratt，《目标》（1984）；Reinertsen，《产品开发流原则》（2009）。

Q2 — "What service trade-off are you accepting?"

Q2 — “你接受哪些服务权衡？”

Recommended answer: a written, explicit choice — fast vs. empathetic, broad vs. deep, low-cost vs. high-quality. Frances Frei is unambiguous: you cannot win all four. The team that tries wins zero.

Why it matters: AHT, SLA, and shrinkage inputs are the operational expression of this trade-off. If they don't agree (e.g., you set AHT for "empathy" but SLA for "speed"), the plan is internally inconsistent.

Canon: Frances Frei & Anne Morriss, Uncommon Service (HBR Press, 2012).

推荐答案：书面、明确的选择——快速vs共情、广泛vs深入、低成本vs高质量。Frances Frei明确指出：你无法同时达成这四个目标。试图兼顾的团队最终会一无所获。 重要性：平均处理时长（AHT）、SLA和人员缩减率输入值是这种权衡的运营体现。如果这些参数不一致（例如，你设置AHT以实现“共情”，但SLA要求“速度”），则计划存在内部矛盾。 参考资料：Frances Frei & Anne Morriss，《卓越服务的艺术》（哈佛商业评论出版社，2012）。

Q3 — "What's your demand P90, and what's the gap to your P99?"

Q3 — “你的需求P90值是多少，与P99值的差距有多大？”

Recommended answer: two specific numbers from the last 90 days of data, with the calendar context of each (e.g., "P90 was 480 tickets/day on normal Tuesdays; P99 was 720 on the day after the November release"). A team sized to P50 misses SLA half the time. A team sized to P99 overstaffs by 30-50%. P90 is the right operating sizing point per Cleveland.

Canon: Brad Cleveland, Call Center Management on Fast Forward (4th ed., 2019); A.K. Erlang, The Theory of Probabilities and Telephone Conversations (1909).

推荐答案：过去90天数据中的两个具体数值，附带各自的时间背景（例如，“P90为正常周二的480工单/天；P99为11月版本发布次日的720工单/天”）。按P50需求配置的团队会有一半时间无法达成SLA。按P99需求配置的团队会超编30-50%。根据Cleveland的理论，P90是合适的运营规模配置点。 参考资料：Brad Cleveland，《呼叫中心管理进阶》（第4版，2019）；A.K. Erlang，《概率理论与电话通话》（1909）。

Q4 — "At your planned utilization, what is P(SLA breach) at P90 and at P99?"

Q4 — “在你规划的利用率下，P90和P99需求对应的SLA违约概率是多少？”

Recommended answer: two probabilities, computed (not guessed) from Erlang-C with your specific N, AHT, and SLA target. If P(breach at P90)

10% you are understaffed at the sizing point. If P(breach at P99) > 50% you have no surge plan and the next peak event will be visible to the CEO.

Canon: Erlang (1909); Hopp & Spearman, Factory Physics (3rd ed., 2008), VUT equation.

推荐答案：两个通过Erlang-C算法结合你的具体人员数量（N）、平均处理时长（AHT）和SLA目标计算得出的概率（而非猜测）。如果P90需求下的违约概率>10%，说明你的规模配置不足。如果P99需求下的违约概率>50%，说明你没有应急计划，下一次峰值事件会引起CEO的关注。 参考资料：Erlang（1909）；Hopp & Spearman，《工厂物理学》（第3版，2008），VUT方程。

Q5 — "Have you budgeted replacement hires for the attrition you'll see this year?"

Q5 — “你是否为今年预计的人员流失率预算了替代招聘名额？”

Recommended answer: yes, with a specific number. At 30% annual attrition (Bersin BPO midpoint), a 20-FTE team loses ~6 people this year. If your "add 5 net" plan is actually a "hire 11" plan, the recruiting volume changes drastically. Anti-pattern #3.

Canon: Bersin/Deloitte talent benchmarks (2015-2023); Edward Lawler, Strategic Workforce Planning (USC CEO, 2008).

推荐答案：是，且有具体数字。在年流失率30%（Bersin业务流程外包基准值）的情况下，一个20人的团队今年将流失约6人。如果你的“净增5人”计划实际是“招聘11人”计划，招聘工作量会大幅增加。对应错误做法#3。 参考资料：Bersin/Deloitte人才基准（2015-2023）；Edward Lawler，《战略性人力规划》（南加州大学CEO项目，2008）。

Q6 — "When does span of control trigger a manager hire, and who is the candidate?"

Q6 — “管理幅度何时触发经理招聘，候选人是谁？”

Recommended answer: a specific quarter (from

hiring_sequencer.py

) and at least one identified candidate (internal lead or external hire). Past 7 ICs/manager, 1:1s degrade, feedback cycles slip, attrition climbs. Past 10 you have a coverage crisis. Hire the manager BEFORE crossing 10, not after.

Canon: Camille Fournier, The Manager's Path (O'Reilly, 2017), ch. 5; Andy Grove, High Output Management (1983).

推荐答案：一个具体的季度（来自

hiring_sequencer.py

脚本的结果），且至少有一个确定的候选人（内部主管或外部招聘）。当管理幅度超过7名一线员工/经理时，一对一沟通质量下降，反馈周期变长，人员流失率上升。超过10人时会出现覆盖危机。请在管理幅度达到10人之前招聘经理，而非之后。 参考资料：Camille Fournier，《经理成长指南》（O'Reilly，2017），第5章；Andy Grove，《高产出管理》（1983）。

Q7 — "What is your surge plan for the P99 day?"

Q7 — “你针对P99峰值日的应急计划是什么？”

Recommended answer: an explicit, documented plan — overflow tier, BPO contracted capacity, on-call rotation, executive escalation tree, OR a written degradation contract that says "on P99 days we extend SLA to X minutes and notify customers proactively". If the answer is "we'll figure it out", the P99 day is a fire visible to the board.

Canon: Hopp & Spearman, Factory Physics (2008); Reinertsen (2009) on capacity-margin discipline.

Walk these seven in order. One at a time. Write the answers down. The plan you submit is only as defensible as your answers to these seven questions.

推荐答案：一个明确、有文档记录的计划——包括溢出层级、外包服务供应商（BPO）签约容量、待命轮值、高管升级流程，或者一份书面的降级协议，说明“在P99峰值日，我们将把SLA延长至X分钟，并主动通知客户”。如果答案是“我们到时再想办法”，那么P99峰值日会演变成董事会可见的危机。 参考资料：Hopp & Spearman，《工厂物理学》（2008）；Reinertsen（2009）关于容量余量原则的内容。

请按顺序逐一回答这七个问题。必须写下答案。你提交的计划的可信度完全取决于你对这七个问题的回答。