Loading...
Loading...
Guides Site Reliability Engineering—SLI/SLO and error budgets, reliability dashboards and burn-rate alerting, production readiness reviews, capacity planning for availability, toil reduction, dependency and failure-mode analysis, release reliability (canaries, rollback criteria), and service-owner incident mitigation tied to customer impact. Use when defining or operating SLOs, measuring error budget burn, improving service reliability, running PRRs before launch, planning scalable resilient capacity, or leading technical mitigation during outages—not for CI/CD pipeline implementation (devops), incident program and paging policy design (incident-management-engineer), cloud access and patch tickets (cloud-system-administrator), load-test profiling (performance-engineer), rollout cutover strategy (deployment-strategist), or greenfield cloud build-out (cloud-engineer).
npx skill4agent add daemon-blockint-tech/agentic-enteprises-skill site-reliability-engineerdevopsincident-management-engineercloud-system-administratorcloud-engineerperformance-engineerdeployment-strategistcluster-deployment-engineercommunication-leadvp-of-infrastructure| Need | Skill |
|---|---|
| CI/CD, GitOps, pipeline observability | |
| Incident program and paging policy | |
| Cloud day-2 operations | |
| Cloud service implementation | |
| Performance testing and tuning | |
| Release cutover strategy | |
| Kubernetes platform ops | |
| Data pipeline SLAs | |
| Security incidents | |
| BCP/DRP, RTO/RPO for security/IdP, ransomware recovery planning | |
| Architecture review | |
| VP infrastructure leadership | |
references/sre_scope_and_principles.mdreferences/sli_slo_error_budgets.mdreferences/observability_reliability.mdreferences/incident_reliability_response.mdreferences/capacity_toil_automation.mdreferences/release_reliability_chaos.md