Loading...
Loading...
Found 24 Skills
Execute chaos engineering experiments to test system resilience. Use when performing specialized testing. Trigger with phrases like "run chaos tests", "test resilience", or "inject failures".
Apply Gremlin's enterprise chaos engineering methodology. Emphasizes categorized failure injection, safety controls, and structured experimentation. Use when implementing chaos engineering in enterprise environments with compliance requirements.
Testing in production with feature flags, canary deployments, synthetic monitoring, and chaos engineering. Use when implementing production observability or progressive delivery.
Advanced testing strategies and methodologies. Use when user asks to "design tests", "test coverage", "property-based testing", "mutation testing", "contract testing", "chaos engineering", "test pyramid", "testing strategy", "behavior-driven development", "acceptance testing", or mentions comprehensive testing approaches.
Test at extremes (1000x bigger/smaller, instant/year-long) to expose fundamental truths hidden at normal scales
Use when the user wants to deploy and run a prepared AWS FIS experiment. Triggers on "execute FIS experiment", "run FIS experiment", "start chaos experiment", "deploy FIS template", "启动 FIS 实验", "运行混沌实验", "执行故障注入实验", "deploy and run the experiment in [directory]". Expects a prepared experiment directory (from aws-fis-experiment-prepare or manually created) containing experiment-template.json, iam-policy.json, cfn-template.yaml, and alarm configs. Deploys resources via CLI or CloudFormation, starts the experiment with strict user confirmation, monitors progress, and generates results report.
Use when the user asks about chaos engineering, fault injection, resilience testing, or HA verification for a SPECIFIC AWS service (e.g., RDS, EKS, MSK, ElastiCache, DynamoDB, S3, Lambda, OpenSearch, etc.). Triggers on "chaos testing on [service]", "fault injection for [service]", "how to test HA of [service]", "FIS scenarios/actions for [service]", "[service] failover testing", "[service] resilience testing", "[service] 混沌测试", "[service] 故障注入", "[service] 高可用验证", "对 [service] 做混沌实验", "test my [service]", "verify my [service] is resilient". Use this skill even when the user phrases it casually like "test my RDS" or "how resilient is my MSK cluster".
Expert knowledge for Chaos Studio development including troubleshooting, limits & quotas, security, configuration, and integrations & coding patterns. Use when defining ARM/Bicep experiments, deploying Chaos Agents, using CLI/REST, or integrating with Azure Monitor, and other Chaos Studio related development tasks. Not for Azure Monitor (use azure-monitor), Azure Resiliency (use azure-resiliency), Azure Reliability (use azure-reliability), Azure Site Recovery (use azure-site-recovery).
Design and implement disaster recovery strategies with RTO/RPO planning, database backups, Kubernetes DR, cross-region replication, and chaos engineering testing. Use when implementing backup systems, configuring point-in-time recovery, setting up multi-region failover, or validating DR procedures.
Injects managed chaos into environments to test system resilience. Validates that self-healing and monitoring systems work as expected under stress.
Expert Site Reliability Engineer specializing in SLOs, error budgets, and reliability engineering practices. Proficient in incident management, post-mortems, capacity planning, and building scalable, resilient systems with focus on reliability, availability, and performance.
Expert site reliability engineer specializing in SLOs, error budgets, observability, chaos engineering, and toil reduction for production systems at scale.