fleet-management
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGrafana Fleet Management and Alloy Configuration
Grafana 集群管理与Alloy配置
Fleet Management lets you author pipeline configurations once and distribute them to many Alloy
collectors remotely via OpAMP. Collectors poll for updates and apply new configurations without
a restart.
Key concepts:
- Collector - an Alloy agent instance, identified by a unique ID and set of attributes
- Pipeline - a named Alloy configuration (YAML) stored in Fleet Management
- Matcher - a label selector that maps a pipeline to matching collectors
- Attributes - key/value labels on a collector used for targeting (e.g. )
env=production
集群管理允许您一次性编写管道配置,然后通过OpAMP远程分发给多个Alloy收集器。收集器会轮询更新并应用新配置,无需重启。
核心概念:
- Collector(收集器) - Alloy代理实例,通过唯一ID和一组属性标识
- Pipeline(管道) - 存储在集群管理中的命名Alloy配置(YAML格式)
- Matcher(匹配器) - 将管道映射到匹配收集器的标签选择器
- Attributes(属性) - 收集器上用于目标定位的键值标签(例如 )
env=production
Step 1: Check the current state
步骤1:检查当前状态
bash
BASE=https://fleet-management-prod-us-east-0.grafana.net
TOKEN=<STACK_ID>:<API_TOKEN>bash
BASE=https://fleet-management-prod-us-east-0.grafana.net
TOKEN=<STACK_ID>:<API_TOKEN>List all registered collectors and their health status
列出所有已注册的收集器及其健康状态
curl -s -X POST "$BASE/collector.v1.CollectorService/ListCollectors"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | {id, name, remoteConfigStatus}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | {id, name, remoteConfigStatus}'
curl -s -X POST "$BASE/collector.v1.CollectorService/ListCollectors"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | {id, name, remoteConfigStatus}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | {id, name, remoteConfigStatus}'
List all pipelines
列出所有管道
curl -s -X POST "$BASE/pipeline.v1.PipelineService/ListPipelines"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}'
In the Grafana Cloud UI: **Connections > Collector > Fleet Management > Collector Inventory**.
Healthy collectors show `REMOTE_CONFIG_STATUS_APPLIED`. Degraded collectors show
`REMOTE_CONFIG_STATUS_FAILED` with a `remoteConfigStatusMessage` describing the error.
---curl -s -X POST "$BASE/pipeline.v1.PipelineService/ListPipelines"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}'
在Grafana Cloud界面中:**Connections > Collector > Fleet Management > Collector Inventory**。
健康的收集器会显示`REMOTE_CONFIG_STATUS_APPLIED`。状态异常的收集器会显示`REMOTE_CONFIG_STATUS_FAILED`,并附带`remoteConfigStatusMessage`描述错误信息。
---Step 2: Understand the pipeline YAML format
步骤2:了解管道YAML格式
Pipelines are valid Alloy configuration files. Alloy uses a HCL-like syntax called River.
alloy
// Basic metrics pipeline: scrape Prometheus metrics and forward to Grafana Cloud
prometheus.scrape "default" {
targets = discovery.relabel.filtered.output
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
scrape_interval = "60s"
}
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push"
basic_auth {
username = "<METRICS_USERNAME>"
password = env("GRAFANA_CLOUD_API_KEY")
}
}
}Key Alloy component categories:
| Category | Example components |
|---|---|
| Discovery | |
| Metrics | |
| Logs | |
| Traces | |
| Profiles | |
| Transformation | |
Reference: Alloy component documentation
管道是有效的Alloy配置文件。Alloy使用一种类似HCL的语法,名为River。
alloy
// 基础指标管道:抓取Prometheus指标并转发到Grafana Cloud
prometheus.scrape "default" {
targets = discovery.relabel.filtered.output
forward_to = [prometheus.remote_write.grafana_cloud.receiver]
scrape_interval = "60s"
}
prometheus.remote_write "grafana_cloud" {
endpoint {
url = "https://prometheus-prod-01-eu-west-0.grafana.net/api/prom/push"
basic_auth {
username = "<METRICS_USERNAME>"
password = env("GRAFANA_CLOUD_API_KEY")
}
}
}Alloy核心组件类别:
| 类别 | 示例组件 |
|---|---|
| 发现 | |
| 指标 | |
| 日志 | |
| 链路追踪 | |
| 性能剖析 | |
| 转换 | |
参考文档: Alloy组件文档
Step 3: Create a pipeline
步骤3:创建管道
bash
undefinedbash
undefinedCreate a pipeline via API (contents is plain text Alloy config, not base64)
通过API创建管道(内容为纯文本Alloy配置,无需base64编码)
curl -s -X POST "$BASE/pipeline.v1.PipelineService/CreatePipeline"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "name": "k8s-metrics", "contents": "prometheus.scrape "default" {\n targets = []\n forward_to = []\n}", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"} ] }'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "name": "k8s-metrics", "contents": "prometheus.scrape "default" {\n targets = []\n forward_to = []\n}", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"} ] }'
In the UI: **Fleet Management > Remote Configuration > Create pipeline**. The wizard offers:
1. Start from a template (Kubernetes, host metrics, logs, traces, profiles)
2. Duplicate an existing pipeline
3. Write from scratch with the inline editor
---curl -s -X POST "$BASE/pipeline.v1.PipelineService/CreatePipeline"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "name": "k8s-metrics", "contents": "prometheus.scrape "default" {\n targets = []\n forward_to = []\n}", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"} ] }'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "name": "k8s-metrics", "contents": "prometheus.scrape "default" {\n targets = []\n forward_to = []\n}", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"} ] }'
在界面中:**Fleet Management > Remote Configuration > Create pipeline**。向导提供以下选项:
1. 从模板开始(Kubernetes、主机指标、日志、链路追踪、性能剖析)
2. 复制现有管道
3. 使用内置编辑器从头编写
---Step 4: Assign pipelines to collectors with matchers
步骤4:使用匹配器将管道分配给收集器
Matchers use label selectors to map a pipeline to collectors. A collector receives all pipelines
whose matchers match its attributes.
json
{
"matchers": [
"env=\"production\"",
"team=\"platform\""
]
}This assigns the pipeline to any collector with both AND .
env=productionteam=platformMatcher syntax:
| Operator | Example | Meaning |
|---|---|---|
| | Exact match |
| | Not equal |
| | Regex match |
| | Regex not match |
Apply matchers when creating or updating a pipeline:
bash
undefined匹配器使用标签选择器将管道映射到收集器。收集器会接收所有匹配其属性的管道。
json
{
"matchers": [
"env=\"production\"",
"team=\"platform\""
]
}这会将管道分配给同时具有和属性的所有收集器。
env=productionteam=platform匹配器语法:
| 操作符 | 示例 | 含义 |
|---|---|---|
| | 精确匹配 |
| | 不相等 |
| | 正则匹配 |
| | 正则不匹配 |
创建或更新管道时应用匹配器:
bash
undefinedMatchers are set in CreatePipeline or UpdatePipeline
匹配器在CreatePipeline或UpdatePipeline中设置
curl -s -X POST "$BASE/pipeline.v1.PipelineService/UpdatePipeline"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "id": "<PIPELINE_ID>", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"}, {"name": "team", "value": "platform", "type": "EQUAL"} ] }'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "id": "<PIPELINE_ID>", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"}, {"name": "team", "value": "platform", "type": "EQUAL"} ] }'
Matcher `type` values: `EQUAL`, `NOT_EQUAL`, `REGEX`, `NOT_REGEX`
A pipeline with no matchers is saved but deployed to zero collectors.
---curl -s -X POST "$BASE/pipeline.v1.PipelineService/UpdatePipeline"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "id": "<PIPELINE_ID>", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"}, {"name": "team", "value": "platform", "type": "EQUAL"} ] }'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{ "id": "<PIPELINE_ID>", "matchers": [ {"name": "env", "value": "production", "type": "EQUAL"}, {"name": "team", "value": "platform", "type": "EQUAL"} ] }'
匹配器`type`取值:`EQUAL`, `NOT_EQUAL`, `REGEX`, `NOT_REGEX`
没有匹配器的管道会被保存,但不会部署到任何收集器。
---Step 5: Set collector attributes
步骤5:设置收集器属性
Attributes are the labels that matchers target. Set them from the UI (Collector Inventory > select
collector > Edit attributes) or via API:
bash
curl -s -X POST "$BASE/collector.v1.CollectorService/UpdateCollector" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "<COLLECTOR_ID>",
"attributes": [
{"name": "env", "value": "production"},
{"name": "team", "value": "platform"},
{"name": "region", "value": "us-east-1"}
]
}'Alloy sets some attributes automatically on registration:
- - OS platform (linux, darwin, windows)
platform - - CPU architecture (amd64, arm64)
arch - - Alloy version string
alloy_version
Custom attributes must be set explicitly — either via the API or by the collector's startup config.
属性是匹配器定位的标签。可以通过界面(Collector Inventory > 选择收集器 > Edit attributes)或API设置:
bash
curl -s -X POST "$BASE/collector.v1.CollectorService/UpdateCollector" \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"id": "<COLLECTOR_ID>",
"attributes": [
{"name": "env", "value": "production"},
{"name": "team", "value": "platform"},
{"name": "region", "value": "us-east-1"}
]
}'Alloy在注册时会自动设置部分属性:
- - 操作系统平台(linux, darwin, windows)
platform - - CPU架构(amd64, arm64)
arch - - Alloy版本字符串
alloy_version
自定义属性必须显式设置——要么通过API,要么通过收集器的启动配置。
Step 6: Install Alloy with remote configuration enabled
步骤6:安装启用远程配置的Alloy
For Alloy to receive remote configuration from Fleet Management, it needs:
- An API token with Fleet Management access
- The block in its local (bootstrap) configuration
remotecfg
alloy
// bootstrap.alloy -- the only local config file Alloy needs
remotecfg {
url = "https://<FLEET_MANAGEMENT_HOST>"
basic_auth {
username = "<STACK_ID>"
password = env("GRAFANA_CLOUD_API_KEY")
}
poll_frequency = "1m"
// Attributes for this collector instance
attributes = {
"env" = env("ENVIRONMENT"),
"team" = "platform",
"region" = env("AWS_REGION"),
}
}Kubernetes deployment:
yaml
undefined要让Alloy接收来自集群管理的远程配置,需要:
- 具有集群管理访问权限的API令牌
- 本地(引导)配置中的块
remotecfg
alloy
// bootstrap.alloy -- Alloy所需的唯一本地配置文件
remotecfg {
url = "https://<FLEET_MANAGEMENT_HOST>"
basic_auth {
username = "<STACK_ID>"
password = env("GRAFANA_CLOUD_API_KEY")
}
poll_frequency = "1m"
// 此收集器实例的属性
attributes = {
"env" = env("ENVIRONMENT"),
"team" = "platform",
"region" = env("AWS_REGION"),
}
}Kubernetes部署:
yaml
undefinedvalues.yaml for grafana/alloy Helm chart
grafana/alloy Helm图表的values.yaml
alloy:
configMap:
content: |
remotecfg {
url = "https://<FLEET_MANAGEMENT_HOST>"
basic_auth {
username = "<STACK_ID>"
password = env("GRAFANA_CLOUD_API_KEY")
}
poll_frequency = "1m"
attributes = {
"env" = "production",
"cluster" = env("CLUSTER_NAME"),
}
}
extraEnv:
- name: GRAFANA_CLOUD_API_KEY
valueFrom:
secretKeyRef:
name: grafana-cloud-credentials
key: api-key
---alloy:
configMap:
content: |
remotecfg {
url = "https://<FLEET_MANAGEMENT_HOST>"
basic_auth {
username = "<STACK_ID>"
password = env("GRAFANA_CLOUD_API_KEY")
}
poll_frequency = "1m"
attributes = {
"env" = "production",
"cluster" = env("CLUSTER_NAME"),
}
}
extraEnv:
- name: GRAFANA_CLOUD_API_KEY
valueFrom:
secretKeyRef:
name: grafana-cloud-credentials
key: api-key
---Step 7: Troubleshoot collector health
步骤7:排查收集器健康问题
Check remote config status:
bash
undefined检查远程配置状态:
bash
undefinedList collectors with FAILED status
列出状态为FAILED的收集器
curl -s -X POST "$BASE/collector.v1.CollectorService/ListCollectors"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | select(.remoteConfigStatus == "REMOTE_CONFIG_STATUS_FAILED") | {id, name, remoteConfigStatusMessage}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | select(.remoteConfigStatus == "REMOTE_CONFIG_STATUS_FAILED") | {id, name, remoteConfigStatusMessage}'
**Common failure patterns:**
| Status message | Root cause | Fix |
|---|---|---|
| `syntax error at line N` | Invalid Alloy River syntax | Fix the pipeline YAML; validate before deploying |
| `component not found: X` | Alloy version too old for a component | Upgrade Alloy or use an older API |
| `failed to unmarshal config` | Base64 encoding error | Re-encode the config correctly |
| `authentication failed` | Wrong API token | Rotate and re-apply the token |
| `connection refused` | Collector can't reach Fleet Management | Check network/firewall rules |
**Check Alloy logs directly:**
```bashcurl -s -X POST "$BASE/collector.v1.CollectorService/ListCollectors"
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | select(.remoteConfigStatus == "REMOTE_CONFIG_STATUS_FAILED") | {id, name, remoteConfigStatusMessage}'
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: application/json"
-d '{}' | jq '.collectors[] | select(.remoteConfigStatus == "REMOTE_CONFIG_STATUS_FAILED") | {id, name, remoteConfigStatusMessage}'
**常见失败模式:**
| 状态消息 | 根本原因 | 修复方法 |
|---|---|---|
| `syntax error at line N` | Alloy River语法无效 | 修复管道YAML;部署前验证 |
| `component not found: X` | Alloy版本过旧,不支持该组件 | 升级Alloy或使用旧版API |
| `failed to unmarshal config` | Base64编码错误 | 重新正确编码配置 |
| `authentication failed` | API令牌错误 | 轮换并重新应用令牌 |
| `connection refused` | 收集器无法连接到集群管理 | 检查网络/防火墙规则 |
**直接检查Alloy日志:**
```bashKubernetes
Kubernetes
kubectl logs -n monitoring -l app.kubernetes.io/name=alloy --tail=50 | grep -i "remote|error"
kubectl logs -n monitoring -l app.kubernetes.io/name=alloy --tail=50 | grep -i "remote|error"
Systemd
Systemd
journalctl -u alloy --since "1h ago" | grep -i "remote|error"
**Check the Alloy UI** (port 12345 by default) at `http://<COLLECTOR_HOST>:12345`:
- **Graph** tab: shows component wiring and health per component
- **Components** tab: lists all components and their current config
- **Clustering** tab: shows clustering state if enabled
---journalctl -u alloy --since "1h ago" | grep -i "remote|error"
**检查Alloy界面**(默认端口12345):`http://<COLLECTOR_HOST>:12345`
- **Graph标签页:** 显示组件连接和各组件健康状态
- **Components标签页:** 列出所有组件及其当前配置
- **Clustering标签页:** 显示启用后的集群状态
---Step 8: Use the Grafana Assistant for pipeline work
步骤8:使用Grafana Assistant处理管道工作
The Grafana Assistant understands Fleet Management and can:
- Explain what a pipeline configuration does
- Identify syntax errors and suggest fixes
- Optimize pipelines for performance or cost
- Generate Mermaid diagrams of component wiring
Via the UI: In the Remote Configuration page, select a pipeline and click the Assistant button.
Options: Explain, Validate/Fix, Optimize, Visualize.
Via API (for automation): The Assistant exposes Fleet Management tools:
- - list collectors and pipelines
fleetManagementRead - - update pipeline configurations
fleetManagementWrite - - validate Alloy River syntax
alloyConfigValidation
Grafana Assistant了解集群管理,可执行以下操作:
- 解释管道配置的作用
- 识别语法错误并建议修复方案
- 优化管道以提升性能或降低成本
- 生成组件连接的Mermaid图
通过界面: 在Remote Configuration页面中,选择一个管道并点击Assistant按钮。选项包括:解释、验证/修复、优化、可视化。
通过API(用于自动化): Assistant提供集群管理工具:
- - 列出收集器和管道
fleetManagementRead - - 更新管道配置
fleetManagementWrite - - 验证Alloy River语法
alloyConfigValidation