internal-docs

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

When this skill is activated, always start your first response with the 🧢 emoji.

激活此技能后，你的第一条回复必须以🧢表情开头。

Internal Docs

内部文档

Internal documentation is the connective tissue of engineering organizations. It captures decisions (RFCs, design docs), preserves operational knowledge (runbooks), extracts lessons from failure (post-mortems), and makes institutional knowledge discoverable (knowledge management). This skill gives an agent the ability to draft, review, and improve internal documents that are clear, actionable, and structured for their specific audience - from a 2-page RFC to a detailed incident post-mortem.

内部文档是工程组织的联结纽带。它记录决策（RFC、设计文档）、留存运维知识（操作手册）、从故障中提取经验（事后复盘文档），并让机构知识易于检索（知识管理）。此技能赋予Agent撰写、评审和优化内部文档的能力，确保文档清晰、可执行，并针对特定受众结构化——从2页的RFC到详细的事件事后复盘文档均适用。

When to use this skill

何时使用此技能

Trigger this skill when the user:

Wants to write or draft an RFC or design document
Needs to create a post-mortem or incident review document
Asks to build an operational runbook or playbook
Wants to organize or structure a team knowledge base
Needs to review an existing internal doc for completeness or clarity
Asks about documentation templates, formats, or best practices
Wants to write an ADR (Architecture Decision Record)
Needs to create onboarding documentation or team guides

Do NOT trigger this skill for:

Public-facing API documentation or developer docs (use api-design skill)
README files or open-source project documentation (use code-level docs conventions)

当用户有以下需求时触发此技能：

想要撰写或起草RFC或设计文档
需要创建事后复盘文档或事件评审文档
请求制作运维操作手册或预案手册
想要整理或结构化团队知识库
需要评审现有内部文档的完整性或清晰度
询问文档模板、格式或最佳实践
想要编写ADR（架构决策记录）
需要创建入职文档或团队指南

请勿在以下场景触发此技能：

面向公众的API文档或开发者文档（使用api-design技能）
README文件或开源项目文档（遵循代码级文档规范）

Key principles

核心原则

Write for the reader, not the writer - Every document exists to transfer knowledge to someone else. Identify who will read it (decision-makers, on-call engineers, new hires) and structure for their needs, not your thought process.
Decisions over descriptions - The most valuable internal docs capture the "why" behind choices. A design doc that only describes the solution without explaining alternatives considered and tradeoffs made is incomplete.
Actionability is everything - A runbook that says "investigate the issue" is worthless. A post-mortem without concrete action items is theater. Every document should leave the reader knowing exactly what to do next.
Living documents decay - Docs that aren't maintained become dangerous. Every document needs an owner and a review cadence, or it should be marked with an explicit expiration date.
Structure enables skimming - Engineers don't read docs linearly. Use headers, TL;DRs, tables, and callouts so readers can find what they need in under 30 seconds.

为读者而非作者写作——每份文档的存在都是为了向他人传递知识。明确目标读者（决策者、值班工程师、新员工），并根据他们的需求构建文档结构，而非你的思考流程。
记录决策而非描述——最有价值的内部文档要捕捉选择背后的“原因”。一份仅描述解决方案、未说明考量过的替代方案和权衡的设计文档是不完整的。
可执行性是核心——一份只写“调查问题”的操作手册毫无价值。没有具体行动项的事后复盘只是形式主义。每份文档都应让读者明确知道下一步该做什么。
活文档会过时——未维护的文档会变得危险。每份文档都需要指定负责人和评审周期，否则应标记明确的过期日期。
结构化便于快速浏览——工程师不会线性阅读文档。使用标题、TL;DR、表格和提示框，让读者能在30秒内找到所需内容。

Core concepts

核心概念

Internal docs fall into four categories, each with a distinct lifecycle and audience:

Decision documents (RFCs, design docs, ADRs) propose a change, gather feedback, and record the final decision. They flow through draft, review, approved/rejected states. The audience is peers and stakeholders who need to evaluate the proposal. See

references/rfcs-and-design-docs.md

Incident documents (post-mortems, incident reviews) are written after something goes wrong. They reconstruct the timeline, identify root causes, and produce action items. The audience is the broader engineering org learning from failure. Blamelessness is non-negotiable. See

references/post-mortems.md

Operational documents (runbooks, playbooks, SOPs) provide step-by-step procedures for recurring tasks or incident response. The audience is the on-call engineer at 3 AM who needs to fix something fast. See

references/runbooks.md

Knowledge documents (wikis, guides, onboarding docs, team pages) preserve institutional knowledge. The audience varies but typically includes new team members and cross-team collaborators. See

references/knowledge-management.md

内部文档分为四类，每类都有独特的生命周期和受众：

决策文档（RFC、设计文档、ADR）提议变更、收集反馈并记录最终决策。它们会经历草稿、评审、批准/驳回等状态。受众是需要评估提案的同行和利益相关者。详见

references/rfcs-and-design-docs.md

。

事件文档（事后复盘文档、事件评审文档）在故障发生后撰写。它们还原时间线、识别根本原因并生成行动项。受众是从故障中学习的广大工程团队。必须采用无责视角。详见

references/post-mortems.md

。

运维文档（操作手册、预案手册、SOP）为重复任务或事件响应提供分步流程。受众是凌晨3点需要快速解决问题的值班工程师。详见

references/runbooks.md

。

知识文档（维基、指南、入职文档、团队页面）留存机构知识。受众各不相同，但通常包括新团队成员和跨团队合作者。详见

references/knowledge-management.md

。

Common tasks

常见任务

Draft an RFC

起草RFC

An RFC proposes a significant technical change and invites structured feedback. Use this template structure:

markdown

undefined

RFC提议重大技术变更并邀请结构化反馈。使用以下模板结构：

markdown

undefined

RFC: <Title>

RFC: <标题>

Author: <name> Status: Draft | In Review | Approved | Rejected Created: <date> Last updated: <date> Reviewers: <list> Decision deadline: <date>

作者: <姓名> 状态: 草稿 | 评审中 | 已批准 | 已驳回 创建日期: <日期> 最后更新: <日期> 评审人: <列表> 决策截止日期: <日期>

TL;DR

<2-3 sentences: what you propose and why>

<2-3句话：你的提议内容及原因>

Motivation

动机

<这解决了什么问题？为什么是现在？如果不做会怎样？>

Proposal

提案

<详细解决方案。必要时包含图表、数据模型、API契约。>

Alternatives considered

考量过的替代方案

<至少2个替代方案，每个需如实列出优缺点>

Tradeoffs and risks

权衡与风险

<我们会放弃什么？可能出现什么问题？如何缓解？>

Rollout plan

上线计划

<如何分阶段实施？是否使用功能开关？迁移方案？>

Open questions

待解决问题

<Unresolved items that need input from reviewers> ```

Always include at least two genuine alternatives. A single-option RFC signals the decision was made before the review process started.

<需要评审人提供输入的未决事项>


> 务必至少包含两个真实的替代方案。仅提供单一选项的RFC意味着决策在评审前就已确定。

Write a post-mortem

撰写事后复盘文档

Post-mortems extract organizational learning from incidents. Follow a blameless approach - focus on systems and processes, never on individuals.

markdown

undefined

事后复盘文档从事件中提取组织经验。遵循无责原则——聚焦系统和流程，而非个人。

markdown

undefined

Post-Mortem: <Incident title>

事后复盘: <事件标题>

Date of incident: <date> Severity: SEV-1 | SEV-2 | SEV-3 Author: <name> Status: Draft | Review | Final Time to detect: <duration> Time to resolve: <duration>

事件日期: <日期> 严重程度: SEV-1 | SEV-2 | SEV-3 作者: <姓名> 状态: 草稿 | 评审中 | 最终版 检测时间: <时长> 解决时间: <时长>

Summary

摘要

<3-4 sentences: what happened, who was affected, and the impact>

<3-4句话：事件经过、受影响对象及影响范围>

Timeline

时间线

Time (UTC)	Event
HH:MM	<what happened>

时间（UTC）	事件
HH:MM	<发生的事件>

Root cause

根本原因

<最底层的“为什么”——使用5Why法深入挖掘，不要停留在表面症状>

Contributing factors

促成因素

<其他导致事件发生或恶化的条件>

What went well

做得好的地方

<响应过程中有效的环节——检测、沟通、工具等>

What went poorly

待改进的地方

<事件暴露的流程或系统漏洞>

Action items

行动项

Action	Owner	Priority	Due date	Status
<specific action>	<name>	P0/P1/P2	<date>	Open


> Every action item must be specific, assigned, and dated. "Improve monitoring"
> is not an action item. "Add latency p99 alert on checkout service at 500ms
> threshold" is.

行动	负责人	优先级	截止日期	状态
<具体行动>	<姓名>	P0/P1/P2	<日期>	未开始


> 每个行动项必须具体、有负责人和截止日期。“改进监控”不是有效行动项，“为结账服务添加p99延迟阈值为500ms的告警”才是。

Create a runbook

制作操作手册

Runbooks provide step-by-step procedures for operational tasks. Write them for the worst case: an engineer who has never seen this system, at 3 AM, under stress.

markdown

undefined

操作手册为运维任务提供分步流程。要为最坏情况撰写：从未接触过该系统的工程师，在凌晨3点、压力之下的场景。

markdown

undefined

Runbook: <Procedure name>

操作手册: <流程名称>

Owner: <team> Last verified: <date> Estimated time: <duration> Risk level: Low | Medium | High

负责人: <团队> 最后验证日期: <日期> 预计时长: <时长> 风险等级: 低 | 中 | 高

When to use

适用场景

<触发条件——何种告警、症状或请求会用到此手册>

Prerequisites

前置条件

Access to <system>
Permissions: <specific roles or credentials needed>

拥有<系统>的访问权限
权限：<所需的特定角色或凭证>

Steps

步骤

Step 1: <Action>

步骤1: <操作>

<Exact command or UI action. No ambiguity.> ```bash kubectl get pods -n production -l app=checkout ```

Expected output: <what you should see if things are working> If this fails: <what to do - escalation path or alternative>

<明确的命令或UI操作，无歧义。>

bash

kubectl get pods -n production -l app=checkout

预期输出: <操作正常时应看到的结果> 如果失败: <应对措施——升级路径或替代方案>

Step 2: <Action>

步骤2: <操作>

...

Rollback

回滚方案

<如果流程执行出错，如何撤销所有操作>

Escalation

升级路径


> Test every runbook by having someone unfamiliar with the system follow it.
> If they get stuck, the runbook is incomplete.

<如果操作手册无法解决问题，应联系谁>


> 请让不熟悉该系统的人按照操作手册执行一遍来测试。如果他们遇到阻碍，说明手册不完整。

Write an Architecture Decision Record (ADR)

撰写架构决策记录（ADR）

ADRs are lightweight, immutable records of a single architectural decision.

markdown

undefined

ADR是轻量级、不可变的单一架构决策记录。

markdown

undefined

ADR-<NNN>: <Decision title>

ADR-<编号>: <决策标题>

Status: Proposed | Accepted | Deprecated | Superseded by ADR-<NNN> Date: <date> Deciders: <names>

状态: 提议中 | 已接受 | 已废弃 | 被ADR-<编号>取代 日期: <日期> 决策者: <姓名列表>

Context

背景

<当前有哪些影响因素？是什么约束或机遇触发了此决策？>

Decision

决策

<我们要做出的变更。用一段清晰说明。>

Consequences

影响


> ADRs are append-only. If a decision is reversed, write a new ADR that supersedes
> the old one. Never edit a finalized ADR.

<哪些会变容易？哪些会变困难？存在哪些风险？>


> ADR是追加式的。如果决策被推翻，请撰写新的ADR取代旧的。切勿编辑已定稿的ADR。

Review an existing document for quality

评审现有文档质量

Walk through the doc checking these dimensions in order:

Audience - Is it clear who this is for? Does the depth match their expertise?
Structure - Can a reader find what they need by skimming headers?
Completeness - Are there gaps that will generate questions?
Actionability - Does the reader know what to do after reading?
Freshness - Is the information current? Are there stale references?
Conciseness - Can anything be cut without losing meaning?

按以下维度逐一检查文档：

受众——是否明确目标读者？内容深度是否匹配他们的专业水平？
结构——读者能否通过浏览标题找到所需内容？
完整性——是否存在会引发疑问的缺口？
可执行性——读者读完后是否知道该做什么？
时效性——信息是否最新？是否存在过时的引用？
简洁性——能否在不丢失信息的前提下删减内容？

Organize a knowledge base

结构化知识库

Structure team knowledge around these four categories (adapted from Divio):

Category	Purpose	Example
Tutorials	Learning-oriented, step-by-step	"Setting up local dev environment"
How-to guides	Task-oriented, problem-solving	"How to deploy a canary release"
Reference	Information-oriented, accurate	"API rate limits by tier"
Explanation	Understanding-oriented, context	"Why we chose event sourcing"

Avoid dumping all docs into a flat wiki. Tag documents by category, team, and system so they remain discoverable as the org scales.

围绕以下四类整理团队知识（改编自Divio）：

类别	用途	示例
教程	面向学习，分步指导	"搭建本地开发环境"
操作指南	面向任务，解决问题	"如何部署金丝雀发布"
参考文档	面向信息，准确严谨	"各层级API速率限制"
解释文档	面向理解，提供背景	"我们为何选择事件溯源"

避免将所有文档杂乱地存入扁平的维基。按类别、团队和系统为文档打标签，以便在组织扩张时仍能轻松检索。

Anti-patterns / common mistakes

反模式/常见错误

Mistake	Why it's wrong	What to do instead
Wall of text	No headers, no TL;DR, no structure - nobody will read it	Add TL;DR upfront, use headers every 3-5 paragraphs, use tables for structured data
Blame in post-mortems	Naming individuals creates fear and suppresses honest reporting	Focus on system and process failures. "The deploy pipeline lacked a canary step" not "Bob deployed without checking"
Runbook with "use judgment"	On-call engineers under stress cannot exercise judgment on unfamiliar systems	Provide explicit decision trees with concrete thresholds
RFC without alternatives	Signals the decision is already made and review is theater	Always include 2+ genuine alternatives with honest tradeoffs
Stale documentation	Outdated docs are worse than no docs - they build false confidence	Set review dates, assign owners, archive aggressively
Copy-paste templates	Filling a template mechanically without adapting to context	Templates are starting points - remove irrelevant sections, add context-specific ones
No action items	Post-mortems and reviews that identify problems but assign no follow-up	Every identified gap must produce a specific, assigned, dated action item

错误	危害	正确做法
大段无结构文本	无标题、无TL;DR、无结构——没人会读	开头添加TL;DR，每3-5段添加标题，用表格呈现结构化数据
事后复盘文档中指责个人	点名个人会引发恐惧，抑制诚实反馈	聚焦系统和流程故障。比如：“部署流水线缺少金丝雀步骤”而非“Bob未检查就部署了”
操作手册中出现“自行判断”	压力之下的值班工程师无法对不熟悉的系统做出判断	提供明确的决策树和具体阈值
RFC无替代方案	暗示决策已提前确定，评审只是形式	务必包含2个以上真实的替代方案，并如实列出权衡
文档过时	过时的文档比没有文档更糟——会误导读者	设置评审日期、指定负责人、及时归档过时文档
机械套用模板	不加调整地填充模板，未结合具体场景	模板只是起点——删除无关部分，添加场景相关内容
无行动项	事后复盘和评审只指出问题，未分配后续任务	每个发现的问题都必须转化为具体、有负责人和截止日期的行动项

References

参考资料

For detailed content on specific document types, read the relevant file from

references/

```
references/rfcs-and-design-docs.md
```
- Deep guide on RFC lifecycle, review processes, and design doc patterns
```
references/post-mortems.md
```
- Blameless post-mortem methodology, 5 Whys technique, and severity frameworks
```
references/runbooks.md
```
- Runbook authoring patterns, testing procedures, and maintenance workflows
```
references/knowledge-management.md
```
- Knowledge base organization, documentation culture, and tooling strategies

Only load a references file if the current task requires deep detail on that topic.

如需了解特定文档类型的详细内容，请阅读

references/

下的相关文件：

```
references/rfcs-and-design-docs.md
```
——RFC生命周期、评审流程和设计文档模式的深度指南
```
references/post-mortems.md
```
——无责事后复盘方法、5Why技术和严重程度框架
```
references/runbooks.md
```
——操作手册撰写模式、测试流程和维护工作流
```
references/knowledge-management.md
```
——知识库组织、文档文化和工具策略

仅当当前任务需要该主题的详细内容时，才加载参考文件。