platform-infrastructure

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Platform & Infrastructure

平台与基础设施

Scope

范围

Covers
  • Platform engineering / “paved roads”: shared capabilities that multiple product teams reuse
  • Infrastructure quality attributes: reliability, performance, privacy/safety, operability, cost
  • Scalability planning: capacity limits, leading indicators, “doomsday clock” triggers, sequencing
  • Instrumentation strategy: server-side event tracking, data quality, observability gaps
  • Discoverability architecture for web platforms (optional): sitemap + internal linking
When to use
  • “Create a platform infrastructure plan to increase feature velocity without repeating work.”
  • “Turn reliability/performance/privacy goals into concrete SLOs and an execution roadmap.”
  • “We’re approaching scaling limits—define triggers and the next infra projects.”
  • “Our analytics is messy—design a server-side tracking plan and event contract.”
  • “For a large web property, define sitemap + internal-linking requirements for crawlability.”
When NOT to use
  • You are handling an active incident or outage (use incident response/runbooks first).
  • You only need a single localized perf fix or refactor (just do the work).
  • You need product strategy/positioning for a platform-as-product (use
    platform-strategy
    ).
  • You need a full feature spec or UX flows (use
    writing-specs-designs
    /
    writing-prds
    ).
  • SEO/content strategy is the primary workstream (use
    content-marketing
    ).
涵盖内容
  • 平台工程/“铺路式”标准化路径:供多个产品团队复用的共享能力
  • 基础设施质量属性:可靠性、性能、隐私/安全、可操作性、成本
  • 扩容规划:容量限制、前置指标、“末日时钟”触发机制、实施顺序
  • 监控策略:服务端事件追踪、数据质量、可观测性缺口
  • Web平台可发现性架构(可选):站点地图 + 内部链接
适用场景
  • “制定平台基础设施规划,提升功能交付速度,避免重复工作。”
  • “将可靠性/性能/隐私目标转化为具体的SLO和执行路线图。”
  • “我们即将达到扩容上限——明确触发机制及后续基础设施项目。”
  • “我们的分析数据混乱不堪——设计服务端追踪方案和事件契约。”
  • “针对大型Web资产,定义站点地图 + 内部链接要求以保障可抓取性。”
不适用场景
  • 正在处理活跃事件或故障(优先使用事件响应/运行手册)。
  • 仅需单个本地化性能修复或重构(直接执行相关工作即可)。
  • 需要针对平台即产品的产品战略/定位(使用
    platform-strategy
    )。
  • 需要完整的功能规格或UX流程(使用
    writing-specs-designs
    /
    writing-prds
    )。
  • SEO/内容策略为核心工作流(使用
    content-marketing
    )。

Inputs

输入信息

Minimum required
  • System boundary (services/apps) + primary users/customers
  • Current pains (pick 1–3): reliability, performance, cost, privacy/security/compliance, developer velocity, data quality/analytics, SEO/discoverability
  • Current architecture constraints (data stores, runtime, deployment model, key dependencies)
  • Scale + trajectory (rough): current usage + expected growth + known upcoming spikes
  • Constraints: deadlines, staffing/capacity, risk tolerance, compliance/privacy requirements
Missing-info strategy
  • Ask up to 5 questions from references/INTAKE.md (3–5 at a time).
  • If details remain missing, proceed with explicit assumptions and provide 2–3 options.
  • If asked to change production systems or run commands, require explicit confirmation and include rollback guidance.
最低要求
  • 系统边界(服务/应用) + 主要用户/客户
  • 当前痛点(选择1-3个):可靠性、性能、成本、隐私/安全/合规、开发者交付速度、数据质量/分析、SEO/可发现性
  • 当前架构约束(数据存储、运行时、部署模型、关键依赖)
  • 规模与发展趋势(大致):当前使用量 + 预期增长 + 已知即将到来的流量峰值
  • 约束条件:截止日期、人员配置/产能、风险承受能力、合规/隐私要求
缺失信息处理策略
  • references/INTAKE.md中提出最多5个问题(每次3-5个)。
  • 若仍有细节缺失,基于明确假设推进,并提供2-3种备选方案。
  • 若被要求更改生产系统或执行命令,需获得明确确认,并包含回滚指导。

Outputs (deliverables)

输出交付物

Produce a Platform & Infrastructure Improvement Pack in Markdown (in-chat; or as files if requested), in this order:
  1. Context snapshot (scope, constraints, assumptions, stakeholders, success definition)
  2. Shared capabilities inventory + platformization plan (what to standardize, why, and how)
  3. Quality attributes spec (reliability/perf/privacy/safety targets; proposed SLOs/SLIs)
  4. Scaling “doomsday clock” + capacity plan (limits, triggers, lead time, projects)
  5. Instrumentation plan (observability gaps + server-side analytics event contract)
  6. Discoverability plan (optional) for web platforms (sitemap + internal linking requirements)
  7. Execution roadmap (sequencing, milestones, owners, dependencies, comms)
  8. Risks / Open questions / Next steps (always included)
Templates: references/TEMPLATES.md
生成Markdown格式的平台与基础设施改进包(可在对话中直接输出;或按需生成文件),按以下顺序排列:
  1. 上下文快照(范围、约束条件、假设、利益相关者、成功定义)
  2. 共享能力清单 + 平台化规划(标准化内容、原因及实施方式)
  3. 质量属性规格(可靠性/性能/隐私/安全目标;提议的SLO/SLI)
  4. 扩容“末日时钟” + 容量规划(限制、触发机制、前置时间、项目)
  5. 监控规划(可观测性缺口 + 服务端分析事件契约)
  6. **可发现性规划(可选)**针对Web平台(站点地图 + 内部链接要求)
  7. 执行路线图(实施顺序、里程碑、负责人、依赖关系、沟通计划)
  8. 风险/待解决问题/下一步行动(必须包含)
模板参考:references/TEMPLATES.md

Workflow (8 steps)

工作流程(8步)

1) Intake + define “what decision will this enable?”

1) 需求收集 + 明确“该决策将支持什么目标?”

  • Inputs: Context; references/INTAKE.md.
  • Actions: Confirm scope boundaries, top pains, and time horizon. Write a 1–2 sentence decision statement (e.g., “We will standardize X and commit to SLO Y by date Z.”).
  • Outputs: Context snapshot (draft).
  • Checks: A stakeholder can answer: “What will we do differently after reading this?”
  • 输入信息:上下文;references/INTAKE.md
  • 行动:确认范围边界、核心痛点及时间范围。撰写1-2句话的决策声明(例如:“我们将标准化X,并在日期Z前达成SLO Y。”)。
  • 输出:上下文快照(草稿)。
  • 校验标准:利益相关者能够回答:“阅读此文档后,我们的行动将有何不同?”

2) Find repeatable product capabilities worth platformizing

2) 识别值得平台化的可复用产品能力

  • Inputs: Recent roadmap/initiatives; architecture overview; pain points.
  • Actions: Inventory repeated “feature components” (e.g., export, filtering, permissions, audit logs, notifications). Identify 3–7 candidates for shared infrastructure. Define what becomes the platform contract vs what remains product-specific.
  • Outputs: Shared capabilities inventory + platformization plan (draft).
  • Checks: Each candidate has: (a) at least 2 consumers, (b) a clear API/contract idea, (c) a migration/rollout approach.
  • 输入信息:近期路线图/举措;架构概述;痛点。
  • 行动:盘点重复出现的“功能组件”(例如:导出、过滤、权限、审计日志、通知)。确定3-7个适合纳入共享基础设施的候选组件。明确哪些内容将成为平台契约,哪些保留为产品特有内容。
  • 输出:共享能力清单 + 平台化规划(草稿)。
  • 校验标准:每个候选组件需满足:(a) 至少有2个使用方,(b) 有清晰的API/契约构想,(c) 有迁移/推广方案。

3) Define quality attributes and targets (make “invisible work” explicit)

3) 定义质量属性与目标(明确“隐性工作”)

  • Inputs: Reliability/perf/privacy needs; customer expectations; compliance constraints.
  • Actions: Write the quality attributes spec. Propose SLOs/SLIs for reliability and performance; document privacy/safety requirements (data residency, encryption, access controls, retention).
  • Outputs: Quality attributes spec (draft).
  • Checks: Targets are measurable and owned (even if initial numbers are estimates + confidence).
  • 输入信息:可靠性/性能/隐私需求;客户期望;合规约束。
  • 行动:撰写质量属性规格。为可靠性和性能提议SLO/SLI;记录隐私/安全要求(数据驻留、加密、访问控制、数据保留)。
  • 输出:质量属性规格(草稿)。
  • 校验标准:目标可衡量且有明确负责人(即使初始数值为估算值 + 置信度)。

4) Build the scaling “doomsday clock”

4) 构建扩容“末日时钟”

  • Inputs: Current bottlenecks/limits; growth expectations; lead times for major changes.
  • Actions: Identify top 3–10 capacity limits (DB size/IOPS, queue depth, cache hit rate, deploy throughput, rate limits). Define thresholds that trigger scaling projects early enough (lead time-aware).
  • Outputs: Doomsday clock table + capacity plan (draft).
  • Checks: Each limit has a metric, an alert threshold, a lead time estimate, and a named mitigation project.
  • 输入信息:当前瓶颈/限制;增长预期;重大变更的前置时间。
  • 行动:识别前3-10个容量限制(数据库大小/IOPS、队列深度、缓存命中率、部署吞吐量、速率限制)。定义足够早触发扩容项目的阈值(考虑前置时间)。
  • 输出:末日时钟表格 + 容量规划(草稿)。
  • 校验标准:每个限制都对应一个指标、预警阈值、前置时间估算及指定的缓解项目。

5) Decide instrumentation: observability + server-side analytics

5) 确定监控方案:可观测性 + 服务端分析

  • Inputs: Current logging/metrics/tracing; current analytics tracking approach.
  • Actions: Specify observability gaps (must-have dashboards/alerts) and define an event contract for server-side analytics (names, properties, identity strategy, delivery guarantees, QA checks).
  • Outputs: Instrumentation plan (draft).
  • Checks: Event definitions are consistent across clients; key events are captured server-side; data-quality checks exist.
  • 输入信息:当前日志/指标/追踪体系;当前分析追踪方案。
  • 行动:明确可观测性缺口(必备仪表盘/告警),并定义服务端分析的事件契约(名称、属性、身份策略、交付保障、QA校验)。
  • 输出:监控规划(草稿)。
  • 校验标准:事件定义在各客户端保持一致;核心事件通过服务端捕获;存在数据质量校验机制。

6) (Optional) Discoverability architecture for web platforms

6) (可选)Web平台可发现性架构

  • Inputs: If applicable: site/app information architecture; SEO importance; crawl constraints.
  • Actions: Define sitemap requirements (categorization, pagination, freshness) and internal-linking rules (“related content”, indexability controls, canonicalization).
  • Outputs: Discoverability plan (draft) or “Not applicable” decision.
  • Checks: A crawler can reach all indexable pages via links/sitemaps; “noindex”/canonicals are intentional.
  • 输入信息:如适用:站点/应用信息架构;SEO重要性;抓取约束。
  • 行动:定义站点地图要求(分类、分页、新鲜度)及内部链接规则(“相关内容”、可索引性控制、规范标签)。
  • 输出:可发现性规划(草稿)或“不适用”决策。
  • 校验标准:爬虫可通过链接/站点地图访问所有可索引页面;“noindex”/规范标签的使用是有意为之。

7) Turn decisions into a sequenced execution roadmap

7) 将决策转化为有序的执行路线图

  • Inputs: Draft deliverables; constraints; dependencies; capacity.
  • Actions: Prioritize initiatives using impact × risk × effort × lead time. Create milestones, owners, and rollout plans (including deprecation/decommission for old paths).
  • Outputs: Execution roadmap (draft).
  • Checks: Roadmap has a first executable milestone, explicit dependencies, and measurable acceptance criteria.
  • 输入信息:所有交付物草稿;约束条件;依赖关系;产能。
  • 行动:通过影响×风险×投入×前置时间对举措进行优先级排序。创建里程碑、负责人及推广计划(包括旧路径的废弃/下线)。
  • 输出:执行路线图(草稿)。
  • 校验标准:路线图包含首个可执行的里程碑、明确的依赖关系及可衡量的验收标准。

8) Quality gate + finalize

8) 质量校验 + 最终定稿

  • Inputs: Full draft pack.
  • Actions: Run references/CHECKLISTS.md and score with references/RUBRIC.md. Tighten unclear contracts, add missing measures, and always include Risks / Open questions / Next steps.
  • Outputs: Final Platform & Infrastructure Improvement Pack.
  • Checks: A team can execute without extra meetings; unknowns are explicit and owned.
  • 输入信息:完整的改进包草稿。
  • 行动:使用references/CHECKLISTS.md进行检查,并通过references/RUBRIC.md进行评分。优化模糊的契约,补充缺失的衡量指标,且必须包含风险/待解决问题/下一步行动
  • 输出:最终版平台与基础设施改进包。
  • 校验标准:团队无需额外会议即可执行;未知事项已明确且有负责人。

Quality gate (required)

质量校验门(必填)

  • Use references/CHECKLISTS.md and references/RUBRIC.md.
  • Always include: Risks, Open questions, Next steps.
  • 使用references/CHECKLISTS.mdreferences/RUBRIC.md
  • 必须包含:风险待解决问题下一步行动

Examples

示例

Example 1 (shared capabilities): “Use
platform-infrastructure
for a B2B analytics app where every team keeps rebuilding export, filtering, and permissions. Output a platformization plan + roadmap + SLO targets.”
Example 2 (scaling readiness): “We expect 5× traffic in 6 months. Define a doomsday clock for Postgres limits, propose scaling projects, and set reliability/performance SLOs. Also standardize server-side analytics.”
Boundary example: “We’re mid-incident and pages are down—tell us what to do right now.”
Response: out of scope; recommend incident response first, then use this skill post-incident to create the scaling plan and reliability roadmap.
示例1(共享能力):“为一款B2B分析应用使用
platform-infrastructure
能力,该应用中每个团队都在重复开发导出、过滤和权限功能。输出平台化规划 + 路线图 + SLO目标。”
示例2(扩容准备):“我们预计6个月内流量将增长5倍。为Postgres的限制定义‘末日时钟’,提议扩容项目,并设定可靠性/性能SLO。同时标准化服务端分析。”
边界示例:“我们正处于故障中,页面无法访问——告诉我们现在该做什么。”
回应:超出范围;建议优先使用事件响应方案,故障解决后再使用此能力创建扩容规划和可靠性路线图。