platform-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Platform Engineering

平台工程

Purpose

目标

Build Internal Developer Platforms (IDPs) that provide self-service infrastructure, reduce cognitive load, and accelerate developer productivity through golden paths and platform-as-product thinking.
Platform engineering represents the evolution beyond traditional DevOps, focusing on creating product-quality internal platforms that treat developers as customers. The discipline addresses the developer productivity crisis where engineers spend 30-40% of time on infrastructure and tooling instead of features.
构建内部开发者平台(IDP),通过黄金路径和平台即产品的思维,提供自助式基础设施、降低认知负荷并提升开发者生产力。
平台工程是传统DevOps的演进,专注于打造以开发者为客户的产品级内部平台。这一学科旨在解决开发者生产力危机——工程师通常会花费30-40%的时间在基础设施和工具上,而非核心功能开发。

When to Use This Skill

适用场景

Trigger this skill when:
  • Building or improving an internal developer platform
  • Designing a developer portal (Backstage, Port, or commercial IDP)
  • Implementing golden paths and software templates
  • Establishing or restructuring a platform engineering team
  • Measuring and improving developer experience (DevEx)
  • Integrating IDP with infrastructure, CI/CD, observability, or security tools
  • Driving platform adoption across an engineering organization
  • Assessing platform maturity and identifying capability gaps
在以下场景中启用该技能:
  • 构建或优化内部开发者平台
  • 设计开发者门户(Backstage、Port或商用IDP)
  • 实施黄金路径和软件模板
  • 组建或重组平台工程团队
  • 衡量并优化开发者体验(DevEx)
  • 将IDP与基础设施、CI/CD、可观测性或安全工具集成
  • 在整个工程组织中推动平台采用
  • 评估平台成熟度并识别能力差距

Core Concepts

核心概念

Platform as Product

平台即产品

Treat internal platforms with the same rigor as customer-facing products:
Product Management Approach:
  • Define platform vision, strategy, and roadmap
  • Identify developer "customers" and their pain points
  • Measure success via adoption metrics, satisfaction surveys, and business impact
  • Iterate based on feedback loops and usage analytics
  • Balance new capabilities with platform reliability and support
Key Differences from Traditional DevOps:
  • DevOps focuses on delivery pipelines; platform engineering builds comprehensive developer experiences
  • Platform teams operate as product teams (product managers, UX designers, engineers)
  • Success measured by developer productivity and satisfaction, not just infrastructure metrics
  • Self-service is the primary interface, not ticket queues
以对待面向客户产品的严谨态度来打造内部平台:
产品管理方法:
  • 定义平台愿景、战略和路线图
  • 识别开发者“客户”及其痛点
  • 通过采用指标、满意度调查和业务影响衡量成功
  • 基于反馈循环和使用分析进行迭代
  • 在新功能与平台可靠性及支持之间取得平衡
与传统DevOps的关键区别:
  • DevOps聚焦于交付流水线;平台工程构建全面的开发者体验
  • 平台团队以产品团队模式运作(包含产品经理、UX设计师、工程师)
  • 成功的衡量标准是开发者生产力和满意度,而非仅基础设施指标
  • 自助服务是主要交互方式,而非工单队列

Internal Developer Platform (IDP) Architecture

内部开发者平台(IDP)架构

Three-Layer Architecture:
1. Developer Portal (Frontend)
  • Service catalog: Inventory of services with ownership, dependencies, health status
  • Software templates: Project scaffolding with best practices baked in
  • Documentation hub: Centralized, searchable, version-controlled docs
  • Self-service workflows: Environment provisioning, deployments, access requests
2. Platform Orchestration (Backend)
  • Infrastructure provisioning: Multi-cloud resource management
  • Environment management: Dev, staging, production lifecycle
  • Deployment automation: GitOps-based continuous delivery
  • Configuration management: Separation of app and infrastructure concerns
3. Integration Layer (Glue)
  • CI/CD integration: Pipeline visibility and triggering
  • Observability: Metrics, logs, traces surfaced in portal
  • Security: Vulnerability scanning, policy enforcement, secrets management
  • FinOps: Cost visibility, budgets, optimization recommendations
For detailed architecture patterns and component breakdowns, see
references/idp-architecture.md
.
三层架构:
1. 开发者门户(前端)
  • 服务目录:包含服务所有权、依赖关系、健康状态的服务清单
  • 软件模板:内置最佳实践的项目脚手架
  • 文档中心:集中化、可搜索、版本控制的文档库
  • 自助工作流:环境部署、发布、权限申请
2. 平台编排(后端)
  • 基础设施部署:多云资源管理
  • 环境管理:开发、预发布、生产环境生命周期管理
  • 发布自动化:基于GitOps的持续交付
  • 配置管理:分离应用与基础设施关注点
3. 集成层(粘合层)
  • CI/CD集成:流水线可见性与触发
  • 可观测性:在门户中展示指标、日志、追踪数据
  • 安全:漏洞扫描、策略执行、密钥管理
  • FinOps:成本可见性、预算、优化建议
如需详细的架构模式和组件分解,请参阅
references/idp-architecture.md

Golden Paths and Scaffolding

黄金路径与脚手架

Golden Path Principle: Provide opinionated templates that handle 80% of use cases while allowing escape hatches for the remaining 20%.
Template Components:
  • Repository structure and boilerplate code
  • Infrastructure as code (Kubernetes manifests, Terraform)
  • CI/CD pipeline configurations
  • Observability instrumentation (metrics, logging, tracing)
  • Security configurations (RBAC, network policies, secrets)
  • Documentation templates (README, runbooks, architecture diagrams)
Constraint Mechanisms:
  • Policy-as-code enforcement (OPA, Kyverno) for security and compliance
  • Resource limits and quotas to prevent over-provisioning
  • Required health checks and observability instrumentation
  • Approved base images and dependency scanning
For template design patterns and examples, see
references/golden-paths.md
.
黄金路径原则: 提供适用于80%场景的标准化模板,同时为剩余20%场景提供“逃生舱”(灵活扩展方式)。
模板组件:
  • 仓库结构与样板代码
  • 基础设施即代码(Kubernetes清单、Terraform配置)
  • CI/CD流水线配置
  • 可观测性埋点(指标、日志、追踪)
  • 安全配置(RBAC、网络策略、密钥)
  • 文档模板(README、运行手册、架构图)
约束机制:
  • 基于策略即代码(OPA、Kyverno)的安全与合规执行
  • 资源限制与配额,防止过度部署
  • 强制健康检查与可观测性埋点
  • 经批准的基础镜像与依赖扫描
如需模板设计模式和示例,请参阅
references/golden-paths.md

Developer Experience (DevEx) Optimization

开发者体验(DevEx)优化

Cognitive Load Reduction:
  • Abstract infrastructure complexity without hiding necessary details
  • Provide sensible defaults with clear override mechanisms
  • Use progressive disclosure (simple for common cases, advanced options available)
  • Consolidate tooling (single developer portal vs. 15+ separate tools)
Key Metrics:
DORA Metrics:
  • Deployment frequency (how often code reaches production)
  • Lead time for changes (commit to production duration)
  • Mean time to recovery (MTTR for incidents)
  • Change failure rate (percentage of deployments causing incidents)
SPACE Framework:
  • Satisfaction: Developer happiness via surveys and NPS
  • Performance: Throughput and efficiency of work completed
  • Activity: Code commits, PRs, deployments (context, not raw counts)
  • Communication: Collaboration quality, discoverability
  • Efficiency: Minimize interruptions, reduce toil
Platform-Specific Metrics:
  • Platform adoption rate (percentage of teams using platform)
  • Self-service rate (actions completed without platform team tickets)
  • Onboarding time (new developer to first production deployment)
  • Template usage (which golden paths are adopted)
  • Support ticket volume and resolution time
降低认知负荷:
  • 抽象基础设施复杂度,但不隐藏必要细节
  • 提供合理默认值及清晰的覆盖机制
  • 采用渐进式披露(常见场景简化处理,高级选项按需提供)
  • 整合工具(单一开发者门户替代15+独立工具)
关键指标:
DORA指标:
  • 部署频率:代码上线至生产环境的频次
  • 变更前置时间:从代码提交到生产部署的时长
  • 恢复时间(MTTR):故障恢复所需时间
  • 变更失败率:导致故障的部署占比
SPACE框架:
  • 满意度:通过调查和净推荐值(NPS)衡量开发者幸福感
  • 性能:工作完成的吞吐量与效率
  • 活跃度:代码提交、PR、发布(关注上下文而非原始计数)
  • 沟通:协作质量、可发现性
  • 效率:减少干扰,降低重复劳动
平台特定指标:
  • 平台采用率:使用平台的团队占比
  • 自助服务率:无需平台团队工单即可完成的操作占比
  • 入职时间:新开发者从入职到首次生产部署的时长
  • 模板使用率:各黄金路径的采用情况
  • 支持工单数量与解决时长

Platform Maturity Assessment

平台成熟度评估

Assess current platform capabilities using a 5-level maturity model:
Level 0: Ad-Hoc - Manual provisioning, no standardization Level 1: Basic Automation - Some IaC and CI/CD, limited self-service Level 2: Paved Paths - Golden path templates, early portal, limited coverage Level 3: Self-Service Platform - Comprehensive portal, 80%+ self-service Level 4: Product-Driven Platform - Data-driven, product team structure, FinOps integration Level 5: AI-Augmented Platform - AI-assisted troubleshooting, predictive optimization
For detailed assessment framework, gap analysis, and improvement roadmap, see
references/maturity-model.md
.
使用5级成熟度模型评估当前平台能力:
Level 0: 临时阶段 - 手动部署,无标准化 Level 1: 基础自动化 - 部分基础设施即代码与CI/CD,自助服务有限 Level 2: 标准化路径 - 黄金路径模板,早期门户,覆盖范围有限 Level 3: 自助服务平台 - 全面门户,80%+操作支持自助服务 Level 4: 产品驱动平台 - 数据驱动,产品团队架构,集成FinOps Level 5: AI增强平台 - AI辅助故障排查,预测性优化
如需详细的评估框架、差距分析和改进路线图,请参阅
references/maturity-model.md

Decision Frameworks

决策框架

Build vs. Buy IDP

自研 vs 采购IDP

Choose Open Source (Backstage) when:
  • Large enterprise (1000+ engineers)
  • Dedicated platform team available (5-10 engineers)
  • Deep customization required
  • Open-source ecosystem preferred
  • Long-term investment (3+ year horizon)
Choose Commercial IDP (Port, Humanitec, Cortex) when:
  • Mid-size organization (100-1000 engineers)
  • Faster time-to-value needed (3-6 months vs. 6-12 months)
  • Prefer managed solution with vendor support
  • Limited platform engineering resources (<5 engineers)
  • Standard use cases (web apps, microservices, CI/CD)
Choose Hybrid Approach when:
  • Large organization needing both flexibility and speed
  • Complex infrastructure requiring orchestration backend
  • Want best-in-class portal + orchestration components
  • Willing to integrate multiple systems (e.g., Backstage + Humanitec)
For complete decision tree, selection criteria, and ROI calculations, see
references/decision-frameworks.md
.
选择开源方案(Backstage)的场景:
  • 大型企业(1000+工程师)
  • 具备专属平台团队(5-10名工程师)
  • 需要深度定制
  • 偏好开源生态
  • 长期投资(3年以上规划)
选择商用IDP(Port、Humanitec、Cortex)的场景:
  • 中型组织(100-1000名工程师)
  • 需快速实现价值(3-6个月 vs 6-12个月)
  • 偏好托管方案与厂商支持
  • 平台工程资源有限(<5名工程师)
  • 标准化使用场景(Web应用、微服务、CI/CD)
选择混合方案的场景:
  • 大型组织,同时需要灵活性与速度
  • 复杂基础设施需要编排后端
  • 希望结合最佳门户与编排组件
  • 愿意集成多个系统(如Backstage + Humanitec)
如需完整的决策树、选择标准和ROI计算,请参阅
references/decision-frameworks.md

Golden Path Design: Flexibility vs. Standardization

黄金路径设计:灵活性 vs 标准化

Spectrum of Control:
High Standardization (Regulated Industries):
  • Limited technology choices, mandatory templates
  • Policy enforcement via admission controllers (OPA, Kyverno)
  • Escape hatches require approval process
Balanced Approach (Recommended for Most):
  • Recommended golden paths (easy, well-documented, supported)
  • Alternatives allowed with documentation
  • Soft enforcement (defaults + education, not hard blocks)
  • Clear ownership for deviations ("deviate and own")
High Flexibility (Innovative Organizations):
  • Golden paths as suggestions (not requirements)
  • Minimal policy enforcement (only critical security)
  • "Build it, run it" ownership model
For detailed guidance on choosing the right balance and enforcement strategies, see
references/decision-frameworks.md
.
控制范围光谱:
高标准化(监管行业):
  • 有限的技术选择,强制使用模板
  • 通过准入控制器(OPA、Kyverno)执行策略
  • 逃生舱需经过审批流程
平衡方案(推荐大多数场景):
  • 推荐黄金路径(易用、文档完善、有支持)
  • 允许替代方案但需提供文档
  • 软约束(默认值+教育引导,而非硬限制)
  • 明确偏差的所有权(“偏差需自行负责”)
高灵活性(创新型组织):
  • 黄金路径仅作为建议(非强制要求)
  • 最小化策略执行(仅关键安全要求)
  • “构建并运维”的所有权模型
如需选择合适平衡方式和执行策略的详细指南,请参阅
references/decision-frameworks.md

Platform Team Structure

平台团队架构

Centralized Model:
  • Single platform team (5-20 engineers) serving entire organization
  • Best for: Small to mid-size orgs (100-500 engineers)
Federated Model:
  • Central team (5-10 engineers) + embedded engineers (1-2 per business unit)
  • Best for: Large orgs (500-2000+ engineers), multiple business units
Hub-and-Spoke Model:
  • Central "hub" team (3-5 engineers) + "spoke" teams contributing plugins
  • Best for: Organizations with strong open-source culture
For team sizing, roles, responsibilities, and governance models, see
references/decision-frameworks.md
.
集中式模型:
  • 单一平台团队(5-20名工程师)服务整个组织
  • 最佳适用:中小型组织(100-500名工程师)
联邦式模型:
  • 中央团队(5-10名工程师)+ 嵌入业务单元的工程师(每个业务单元1-2名)
  • 最佳适用:大型组织(500-2000+名工程师)、多业务单元
枢纽-分支模型:
  • 中央“枢纽”团队(3-5名工程师)+ “分支”团队贡献插件
  • 最佳适用:拥有强大开源文化的组织
如需团队规模、角色、职责和治理模型的详细信息,请参阅
references/decision-frameworks.md

Tool Recommendations

工具推荐

Developer Portals

开发者门户

Backstage (Open Source, CNCF)
  • Trust Score: 78.7/100, 8,876 code snippets
  • Software catalog, scaffolder, TechDocs, plugin ecosystem
  • Recommended for: Enterprises with platform teams
Port (Commercial)
  • Managed platform, modern UI/UX, faster time-to-value
  • Recommended for: Mid-size orgs (100-1000 engineers)
Cortex (Commercial SaaS)
  • Enterprise IDP, compliance focus, engineering standards enforcement
  • Recommended for: Regulated industries
Backstage(开源,CNCF)
  • 信任评分:78.7/100,8,876个代码片段
  • 功能:服务目录、脚手架、TechDocs、插件生态
  • 推荐场景:拥有平台团队的企业
Port(商用)
  • 托管平台,现代UI/UX,快速实现价值
  • 推荐场景:中型组织(100-1000名工程师)
Cortex(商用SaaS)
  • 企业级IDP,聚焦合规,执行工程标准
  • 推荐场景:监管行业

Platform Orchestration

平台编排

Crossplane (Open Source, CNCF)
  • Trust Score: 67.4/100, universal control plane for multi-cloud
  • Kubernetes-native declarative infrastructure
  • Recommended for: Multi-cloud abstractions
Humanitec (Commercial)
  • Platform Orchestrator backend, environment and deployment management
  • Recommended for: Complex infrastructure, complements portals
Terraform Cloud (Commercial)
  • Mature IaC orchestration, workspace management
  • Recommended for: Terraform-heavy organizations
Crossplane(开源,CNCF)
  • 信任评分:67.4/100,多云通用控制平面
  • Kubernetes原生声明式基础设施
  • 推荐场景:多云抽象
Humanitec(商用)
  • 平台编排后端,环境与发布管理
  • 推荐场景:复杂基础设施,补充门户功能
Terraform Cloud(商用)
  • 成熟的基础设施即代码编排,工作区管理
  • 推荐场景:重度使用Terraform的组织

GitOps Continuous Delivery

GitOps持续交付

Argo CD (Open Source, CNCF) - RECOMMENDED
  • Trust Score: 91.8/100 (HIGHEST)
  • Declarative GitOps for Kubernetes, multi-cluster management
  • Industry-leading documentation and community
Flux (Open Source, CNCF)
  • Toolkit approach, Kubernetes-native
  • Good for: GitOps-native operations
For detailed tool comparisons, integration patterns, and selection criteria, see
references/tool-recommendations.md
.
Argo CD(开源,CNCF)- 推荐
  • 信任评分:91.8/100(最高)
  • Kubernetes声明式GitOps,多集群管理
  • 行业领先的文档和社区
Flux(开源,CNCF)
  • 工具包方式,Kubernetes原生
  • 推荐场景:原生GitOps操作
如需详细的工具对比、集成模式和选择标准,请参阅
references/tool-recommendations.md

Implementation Guides

实施指南

Bootstrapping a Platform

平台启动

Foundation Phase (Months 1-3):
  1. Define platform vision and form platform team (3-5 members)
  2. Interview developers to identify pain points
  3. Set up developer portal (Backstage or commercial)
  4. Create initial service catalog and first golden path template
Pilot Phase (Months 4-6):
  1. Select 2-3 pilot teams for white-glove onboarding
  2. Rapid iteration based on feedback
  3. Expand to 3-5 golden path templates
  4. Integrate key tools (CI/CD, monitoring, secrets)
Expansion Phase (Months 7-12):
  1. Scale to 20-50% of engineering teams
  2. Build self-service documentation and training
  3. Establish platform SLOs and on-call rotation
  4. Internal evangelization (demos, champions program)
Maturity Phase (Year 2+):
  1. 80%+ adoption across organization
  2. Platform team operates as product team
  3. Continuous improvement via metrics and feedback
  4. AI-assisted capabilities, policy-as-code expansion
For detailed implementation steps and bootstrapping code, see
references/implementation-backstage.md
.
基础阶段(第1-3个月):
  1. 定义平台愿景并组建平台团队(3-5名成员)
  2. 访谈开发者以识别痛点
  3. 部署开发者门户(Backstage或商用方案)
  4. 创建初始服务目录和首个黄金路径模板
试点阶段(第4-6个月):
  1. 选择2-3个试点团队提供专属支持
  2. 基于反馈快速迭代
  3. 扩展至3-5个黄金路径模板
  4. 集成关键工具(CI/CD、监控、密钥管理)
扩展阶段(第7-12个月):
  1. 覆盖20-50%的工程团队
  2. 构建自助文档和培训资源
  3. 建立平台SLO和值班轮换机制
  4. 内部推广(演示、倡导者计划)
成熟阶段(第2年及以后):
  1. 覆盖80%+的组织团队
  2. 平台团队以产品团队模式运作
  3. 通过指标和反馈持续改进
  4. AI辅助功能、策略即代码扩展
如需详细的实施步骤和启动代码,请参阅
references/implementation-backstage.md

Creating Golden Path Templates

创建黄金路径模板

Template Design Process:
  1. Identify most common use case (web app, API, data pipeline)
  2. Define opinionated choices (language, framework, deployment pattern)
  3. Create repository structure and infrastructure manifests
  4. Configure CI/CD pipeline with security scanning
  5. Instrument observability and document usage
  6. Test with pilot team before broad rollout
Template Categories:
  • Full-stack web application (backend API + frontend + database)
  • Data pipeline (ETL/ELT with orchestration)
  • Machine learning service (model serving, monitoring)
  • Event-driven microservice (message broker integration)
  • Scheduled job (cron jobs, batch processing)
For template examples, scaffolding code, and customization patterns, see
references/golden-paths.md
and
examples/
directory.
模板设计流程:
  1. 识别最常见的使用场景(Web应用、API、数据流水线)
  2. 定义标准化选择(语言、框架、发布模式)
  3. 创建仓库结构和基础设施清单
  4. 配置包含安全扫描的CI/CD流水线
  5. 植入可观测性埋点并记录使用方法
  6. 在试点团队测试后再全面推广
模板类别:
  • 全栈Web应用(后端API + 前端 + 数据库)
  • 数据流水线(带编排的ETL/ELT)
  • 机器学习服务(模型部署、监控)
  • 事件驱动微服务(消息队列集成)
  • 定时任务(Cron任务、批处理)
如需模板示例、脚手架代码和定制模式,请参阅
references/golden-paths.md
examples/
目录。

Driving Platform Adoption

推动平台采用

Evangelization Strategies:
  • Showcase pilot team successes (internal blog posts, demos)
  • Lunch-and-learns on platform capabilities
  • Internal champions program (power users helping peers)
  • Office hours and Slack/Teams support channels
Incentive Alignment:
  • Make platform easier than alternatives (golden paths are "paved roads")
  • Integrate with workflows developers already use
  • Provide immediate value (faster onboarding, better visibility)
  • Celebrate early adopters, showcase their successes
For adoption metrics, tracking dashboards, and success patterns, see
references/maturity-model.md
.
推广策略:
  • 展示试点团队的成功案例(内部博客、演示)
  • 举办平台能力的午餐学习会
  • 内部倡导者计划(核心用户帮助同事)
  • 办公时间和Slack/Teams支持频道
激励对齐:
  • 让平台比替代方案更易用(黄金路径是“铺平的道路”)
  • 与开发者已使用的工作流集成
  • 提供即时价值(更快入职、更好的可见性)
  • 表彰早期采用者,展示他们的成功
如需采用指标、追踪仪表盘和成功模式,请参阅
references/maturity-model.md

Quick Reference

快速参考

Platform Engineering Checklist

平台工程检查清单

Strategy and Vision:
  • Platform vision and charter documented
  • Platform team formed with clear roles
  • Developer pain points identified via interviews
  • Success metrics defined (DORA, SPACE, adoption)
IDP Foundation:
  • Developer portal deployed (Backstage, Port, or commercial)
  • Service catalog established (ownership, dependencies, health)
  • First golden path template created and validated
  • Documentation hub accessible to all engineers
Self-Service Capabilities:
  • Environment provisioning (dev, staging, production)
  • Deployment automation (GitOps with Argo CD or Flux)
  • CI/CD integration visible in portal
  • Observability dashboards per-service
Security and Compliance:
  • Policy-as-code enforcement (OPA, Kyverno)
  • Secrets management integrated (Vault, cloud providers)
  • Vulnerability scanning in pipelines
  • RBAC and access controls configured
Operations and Support:
  • Platform SLOs defined and monitored
  • Support channels established (Slack, office hours)
  • Incident response playbooks documented
  • Feedback loops and usage analytics in place
战略与愿景:
  • 已记录平台愿景与章程
  • 已组建角色明确的平台团队
  • 已通过访谈识别开发者痛点
  • 已定义成功指标(DORA、SPACE、采用率)
IDP基础:
  • 已部署开发者门户(Backstage、Port或商用方案)
  • 已建立服务目录(所有权、依赖关系、健康状态)
  • 已创建并验证首个黄金路径模板
  • 所有工程师均可访问文档中心
自助服务能力:
  • 支持环境部署(开发、预发布、生产)
  • 支持发布自动化(基于Argo CD或Flux的GitOps)
  • 门户中可见CI/CD集成
  • 每个服务都有可观测性仪表盘
安全与合规:
  • 已执行策略即代码(OPA、Kyverno)
  • 已集成密钥管理(Vault、云厂商服务)
  • 流水线中包含漏洞扫描
  • 已配置RBAC和访问控制
运营与支持:
  • 已定义并监控平台SLO
  • 已建立支持渠道(Slack、办公时间)
  • 已记录事件响应手册
  • 已建立反馈循环和使用分析

Common Pitfalls

常见陷阱

Building Too Much Upfront:
  • Start small (1 golden path, pilot team) and iterate
  • Avoid "boil the ocean" syndrome
Ignoring Developer Feedback:
  • Establish continuous feedback loops, not just quarterly surveys
Over-Standardization:
  • Provide clear escape hatches for advanced use cases
Under-Measuring Success:
  • Track DORA metrics, satisfaction surveys, self-service rates
Treating Platform as IT Project:
  • Platform engineering is product development, not infrastructure provisioning
  • Requires product managers, UX designers, customer focus
前期构建过多:
  • 从小处着手(1个黄金路径、1个试点团队)并迭代
  • 避免“一口吃成胖子”的误区
忽略开发者反馈:
  • 建立持续反馈循环,而非仅季度调查
过度标准化:
  • 为高级场景提供清晰的逃生舱
衡量成功不足:
  • 追踪DORA指标、满意度调查、自助服务率
将平台视为IT项目:
  • 平台工程是产品开发,而非基础设施部署
  • 需要产品经理、UX设计师和以客户为中心的思维

Integration with Other Skills

与其他技能的集成

Related Skills:
  • kubernetes-operations
    : Cluster operations, namespace management, RBAC, network policies
  • infrastructure-as-code
    : Terraform, Pulumi for infrastructure provisioning integrated with platform
  • gitops-workflows
    : GitOps principles, Argo CD / Flux implementation patterns
  • building-ci-pipelines
    : CI/CD pipeline design integrated into platform templates
  • security-hardening
    : Security best practices enforced through golden paths
  • secret-management
    : Secrets management integrated into platform (Vault, cloud providers)
  • observability
    : Monitoring, logging, tracing integrated into developer portal
Cross-Skill Workflows:
Platform Bootstrapping:
  1. Use
    infrastructure-as-code
    to provision platform infrastructure
  2. Use
    kubernetes-operations
    to configure clusters
  3. Deploy developer portal (Backstage) on platform infrastructure
  4. Integrate
    gitops-workflows
    (Argo CD) for continuous delivery
  5. Add
    observability
    integrations (Prometheus, Grafana plugins)
Golden Path Creation:
  1. Design template based on common use case
  2. Use
    building-ci-pipelines
    patterns for CI/CD configuration
  3. Apply
    security-hardening
    best practices (SAST, container scanning)
  4. Integrate
    secret-management
    (Vault, encrypted configs)
  5. Add
    observability
    instrumentation (metrics, logging, tracing)
相关技能:
  • kubernetes-operations
    :集群操作、命名空间管理、RBAC、网络策略
  • infrastructure-as-code
    :与平台集成的Terraform、Pulumi基础设施部署
  • gitops-workflows
    :GitOps原则、Argo CD / Flux实施模式
  • building-ci-pipelines
    :集成到平台模板的CI/CD流水线设计
  • security-hardening
    :通过黄金路径执行安全最佳实践
  • secret-management
    :与平台集成的密钥管理(Vault、云厂商服务)
  • observability
    :集成到开发者门户的监控、日志、追踪
跨技能工作流:
平台启动:
  1. 使用
    infrastructure-as-code
    部署平台基础设施
  2. 使用
    kubernetes-operations
    配置集群
  3. 在平台基础设施上部署开发者门户(Backstage)
  4. 集成
    gitops-workflows
    (Argo CD)实现持续交付
  5. 添加
    observability
    集成(Prometheus、Grafana插件)
黄金路径创建:
  1. 基于常见场景设计模板
  2. 使用
    building-ci-pipelines
    模式配置CI/CD
  3. 应用
    security-hardening
    最佳实践(SAST、容器扫描)
  4. 集成
    secret-management
    (Vault、加密配置)
  5. 添加
    observability
    埋点(指标、日志、追踪)

Example Use Cases

示例用例

Use Case 1: E-Commerce Platform Team

用例1:电商平台团队

Context: 300-engineer e-commerce company, microservices architecture, manual provisioning causing bottlenecks.
Approach: Deploy Backstage, create 3 golden paths, integrate Argo CD, pilot with 3 teams, expand to 20 teams over 6 months.
Results: Onboarding time 2 days → 2 hours, deployment frequency 2x/week → 10x/day, developer NPS +35.
背景: 拥有300名工程师的电商公司,微服务架构,手动部署导致瓶颈。
方案: 部署Backstage,创建3个黄金路径,集成Argo CD,与3个团队试点,6个月内扩展至20个团队。
结果: 入职时间从2天缩短至2小时,部署频率从每周2次提升至每天10次,开发者NPS提升35分。

Use Case 2: Financial Services Platform

用例2:金融服务平台

Context: 1500-engineer bank, strict compliance, legacy infrastructure, fragmented tooling.
Approach: Adopt Port (commercial), high standardization golden paths, OPA Gatekeeper, federated model, Terraform Cloud.
Results: Compliance audit prep 3 weeks → 3 days, infrastructure drift incidents 90% reduction, per-service cost attribution.
背景: 拥有1500名工程师的银行,严格合规要求,遗留基础设施,工具碎片化。
方案: 采用Port(商用),高标准化黄金路径,OPA Gatekeeper,联邦式团队架构,Terraform Cloud。
结果: 合规审计准备时间从3周缩短至3天,基础设施漂移事件减少90%,实现按服务成本归因。

Use Case 3: Startup Platform

用例3:初创公司平台

Context: 50-engineer startup, rapid growth, need fast developer onboarding.
Approach: Lightweight Backstage (2 engineers), 2 golden paths, GitHub Actions, PaaS infrastructure (Fly.io), documentation focus.
Results: New engineer to production 1 day (vs. 2 weeks), 100% self-service, 2 engineers supporting 50 developers.
For code examples and template structures, see
examples/
directory.
背景: 拥有50名工程师的初创公司,快速增长,需要快速开发者入职。
方案: 轻量级Backstage(2名工程师维护),2个黄金路径,GitHub Actions,PaaS基础设施(Fly.io),聚焦文档。
结果: 新工程师从入职到首次生产部署的时间从2周缩短至1天,100%自助服务,2名工程师支持50名开发者。
如需代码示例和模板结构,请参阅
examples/
目录。