holmesgpt-skill

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

HolmesGPT Skill

HolmesGPT 使用指南

AI-powered troubleshooting for Kubernetes and cloud-native environments.

基于AI的Kubernetes与云原生环境故障排查工具。

Overview

概述

HolmesGPT is a CNCF Sandbox project that connects AI models with live observability data to investigate infrastructure problems, find root causes, and suggest remediations. It operates with read-only access and respects RBAC permissions, making it safe for production environments.

HolmesGPT是CNCF沙盒项目，它将AI模型与实时可观测性数据相连，用于排查基础设施问题、定位根因并给出修复建议。它采用只读访问权限，并遵循RBAC权限规则，在生产环境中使用安全可靠。

Quick Reference

快速参考

Topic	Reference
Installation	`references/installation.md`
Configuration	`references/configuration.md`
Data Sources	`references/data-sources.md`
Commands	`references/commands.md`
Troubleshooting	`references/troubleshooting.md`
HTTP API	`references/http-api.md`
Integrations	`references/integrations.md`

主题	参考文档
安装	`references/installation.md`
配置	`references/configuration.md`
数据源	`references/data-sources.md`
命令	`references/commands.md`
故障排查	`references/troubleshooting.md`
HTTP API	`references/http-api.md`
集成	`references/integrations.md`

Key Features

核心特性

Root Cause Analysis: Investigates alerts and cluster issues
Multi-Source Integration: 30+ toolsets (K8s, Prometheus, Grafana)
Alert Integration: AlertManager, PagerDuty, OpsGenie, Jira, Slack
Interactive Mode: Troubleshooting with
```
/run
```
,
```
/show
```
,
```
/clear
```
Custom Toolsets: Extend with proprietary tools via YAML configuration
CI/CD Integration: Automated deployment failure investigation

根因分析：排查告警与集群问题
多源集成：支持30+工具集（K8s、Prometheus、Grafana）
告警集成：AlertManager、PagerDuty、OpsGenie、Jira、Slack
交互模式：通过
```
/run
```
、
```
/show
```
、
```
/clear
```
命令进行故障排查
自定义工具集：通过YAML配置扩展专有工具
CI/CD集成：自动排查部署失败问题

Installation Quick Start

快速安装指南

CLI (Homebrew)

CLI（Homebrew）

bash

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt
export ANTHROPIC_API_KEY="your-key"  # or OPENAI_API_KEY
holmes ask "what pods are unhealthy?"

bash

brew tap robusta-dev/homebrew-holmesgpt
brew install holmesgpt
export ANTHROPIC_API_KEY="your-key"  # 或使用OPENAI_API_KEY
holmes ask "what pods are unhealthy?"

Kubernetes (Helm)

Kubernetes（Helm）

bash

helm repo add robusta https://robusta-charts.storage.googleapis.com
helm repo update
helm install holmesgpt robusta/holmes -f values.yaml

bash

helm repo add robusta https://robusta-charts.storage.googleapis.com
helm repo update
helm install holmesgpt robusta/holmes -f values.yaml

Docker

bash

docker run -it --net=host \
  -e OPENAI_API_KEY="your-key" \
  -v ~/.kube/config:/root/.kube/config \
  us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes \
  ask "what pods are crashing?"

bash

docker run -it --net=host \
  -e OPENAI_API_KEY="your-key" \
  -v ~/.kube/config:/root/.kube/config \
  us-central1-docker.pkg.dev/genuine-flight-317411/devel/holmes \
  ask "what pods are crashing?"

Essential Commands

核心命令

bash

undefined

bash

undefined

Basic investigation

holmes ask "what pods are unhealthy and why?" holmes ask "why is my deployment failing?"

Interactive mode

holmes ask "investigate issue" --interactive

Alert investigation

holmes investigate alertmanager --alertmanager-url http://localhost:9093 holmes investigate pagerduty --pagerduty-api-key <KEY> --update

With file context

holmes ask "summarize the key points" -f ./logs.txt

CI/CD integration

holmes ask "why did deployment fail?" --destination slack --slack-token <TOKEN>

undefined

holmes ask "why did deployment fail?" --destination slack --slack-token <TOKEN>

undefined

Supported AI Providers

支持的AI提供商

Provider	Environment Variable	Models
Anthropic	`ANTHROPIC_API_KEY`	Sonnet 4, Opus 4.5
OpenAI	`OPENAI_API_KEY`	GPT-4.1, GPT-4o
Azure OpenAI	`AZURE_API_KEY`	GPT-4.1
AWS Bedrock	AWS credentials	Claude 3.5 Sonnet
Google Gemini	`GEMINI_API_KEY`	Gemini 1.5 Pro
Vertex AI	`VERTEXAI_PROJECT`	Gemini 1.5 Pro
Ollama	Local install	Llama 3.1, Mistral

提供商	环境变量	模型
Anthropic	`ANTHROPIC_API_KEY`	Sonnet 4、Opus 4.5
OpenAI	`OPENAI_API_KEY`	GPT-4.1、GPT-4o
Azure OpenAI	`AZURE_API_KEY`	GPT-4.1
AWS Bedrock	AWS credentials	Claude 3.5 Sonnet
Google Gemini	`GEMINI_API_KEY`	Gemini 1.5 Pro
Vertex AI	`VERTEXAI_PROJECT`	Gemini 1.5 Pro
Ollama	Local install	Llama 3.1、Mistral

Basic Helm Values Structure

Helm Values 基础结构

yaml

undefined

yaml

undefined

values.yaml for Kubernetes deployment

image: repository: robustadev/holmes tag: latest

env:

name: ANTHROPIC_API_KEY valueFrom: secretKeyRef: name: holmesgpt-secrets key: anthropic-api-key

image: repository: robustadev/holmes tag: latest

env:

name: ANTHROPIC_API_KEY valueFrom: secretKeyRef: name: holmesgpt-secrets key: anthropic-api-key

Model configuration

modelList: sonnet: api_key: "{{ env.ANTHROPIC_API_KEY }}" model: anthropic/claude-sonnet-4-20250514 temperature: 0

Toolsets to enable

toolsets: kubernetes/core: enabled: true kubernetes/logs: enabled: true prometheus/metrics: enabled: true

Resources

resources: requests: memory: "1024Mi" cpu: "100m" limits: memory: "1024Mi"

RBAC (read-only by default)

createServiceAccount: true

undefined

createServiceAccount: true

undefined

Interactive Mode Commands

交互模式命令

Command	Description
`/clear`	Reset context when changing topics
`/run`	Execute custom commands and share output with AI
`/show`	Display complete tool outputs
`/context`	Review accumulated investigation information

命令	描述
`/clear`	切换主题时重置上下文
`/run`	执行自定义命令并将输出共享给AI
`/show`	显示完整的工具输出
`/context`	查看已积累的排查信息

Custom Toolset Example

自定义工具集示例

yaml

undefined

yaml

undefined

custom-toolset.yaml

toolsets: my-custom-tool: description: "Custom diagnostic tool" tools: - name: check_service_health description: "Check health of a specific service" command: | curl -s http://{{ service_name }}.{{ namespace }}.svc.cluster.local/health parameters: - name: service_name description: "Name of the service" - name: namespace description: "Kubernetes namespace"


Use with: `holmes ask "check health" -t custom-toolset.yaml`


使用方式：`holmes ask "check health" -t custom-toolset.yaml`

Kubernetes Annotations for Integration

Kubernetes 集成注解

yaml

undefined

yaml

undefined

Add to Services/Deployments for HolmesGPT context

metadata: annotations: holmesgpt.dev/runbook: | This service handles payment processing. Common issues: database connectivity, API rate limits. Check: kubectl logs -l app=payment-service

undefined

metadata: annotations: holmesgpt.dev/runbook: | This service handles payment processing. Common issues: database connectivity, API rate limits. Check: kubectl logs -l app=payment-service

undefined

Environment Variables Reference

环境变量参考

Variable	Description	Default
`HOLMES_CONFIG_PATH`	Config file path	`~/.holmes/config.yaml`
`HOLMES_LOG_LEVEL`	Log verbosity	`INFO`
`PROMETHEUS_URL`	Prometheus server URL	-
`GITHUB_TOKEN`	GitHub API token	-
`DATADOG_API_KEY`	DataDog API key	-
`CONFLUENCE_BASE_URL`	Confluence URL	-

变量	描述	默认值
`HOLMES_CONFIG_PATH`	配置文件路径	`~/.holmes/config.yaml`
`HOLMES_LOG_LEVEL`	日志级别	`INFO`
`PROMETHEUS_URL`	Prometheus服务器地址	-
`GITHUB_TOKEN`	GitHub API令牌	-
`DATADOG_API_KEY`	DataDog API密钥	-
`CONFLUENCE_BASE_URL`	Confluence地址	-

Best Practices

最佳实践

Use Specific Queries: Include namespace, deployment name, symptoms
Start with Claude Sonnet 4.0/4.5: Best accuracy for complex investigations
Enable Relevant Toolsets: Only enable what you need to reduce noise
Use Interactive Mode: For complex multi-step investigations
Set Up Runbooks: Provide context for known alert types
CI/CD Integration: Automate deployment failure analysis

使用具体查询：包含命名空间、部署名称、症状信息
优先使用Claude Sonnet 4.0/4.5：复杂排查场景下准确率最高
启用相关工具集：仅启用所需工具集以减少干扰
使用交互模式：适用于复杂的多步骤排查
设置运行手册：为已知告警类型提供上下文
CI/CD集成：自动分析部署失败原因

Security Considerations

安全注意事项

HolmesGPT uses read-only access (
```
get
```
,
```
list
```
,
```
watch
```
only)
Respects existing RBAC permissions
Never modifies, creates, or deletes resources
API keys stored in Kubernetes Secrets
Data not used for model training

HolmesGPT 使用只读权限（仅
```
get
```
、
```
list
```
、
```
watch
```
操作）
遵循现有RBAC权限规则
绝不会修改、创建或删除资源
API密钥存储在Kubernetes Secrets中
数据不会用于模型训练

Official Resources

官方资源

Documentation: https://holmesgpt.dev/
GitHub: https://github.com/robusta-dev/holmesgpt
Helm Chart: https://github.com/robusta-dev/holmesgpt/tree/master/helm/holmes
Slack Community: Cloud Native Slack

文档：https://holmesgpt.dev/
GitHub：https://github.com/robusta-dev/holmesgpt
Helm Chart：https://github.com/robusta-dev/holmesgpt/tree/master/helm/holmes
Slack社区：Cloud Native Slack