mimir

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Grafana Mimir Skill

Grafana Mimir 完整指南

Comprehensive guide for Grafana Mimir - the horizontally scalable, highly available, multi-tenant time series database for long-term Prometheus metrics storage.

本指南是Grafana Mimir的完整参考文档——这是一款用于Prometheus指标长期存储的水平可扩展、高可用、多租户时序数据库。

What is Mimir?

什么是Mimir？

Mimir is an open-source, horizontally scalable, highly available, multi-tenant long-term storage solution for Prometheus and OpenTelemetry metrics that:

Overcomes Prometheus limitations - Scalability and long-term retention
Multi-tenant by default - Built-in tenant isolation via
```
X-Scope-OrgID
```
header
Stores data in object storage - S3, GCS, Azure Blob Storage, or Swift
100% Prometheus compatible - PromQL queries, remote write protocol
Part of LGTM+ Stack - Logs, Grafana, Traces, Metrics unified observability

Mimir是一款开源、水平可扩展、高可用、多租户的Prometheus与OpenTelemetry指标长期存储解决方案，具备以下特性：

突破Prometheus的局限——解决可扩展性与长期存储问题
默认支持多租户——通过
```
X-Scope-OrgID
```
请求头实现内置租户隔离
将数据存储在对象存储中——支持S3、GCS、Azure Blob Storage或Swift
100%兼容Prometheus——支持PromQL查询、远程写入协议
属于LGTM+ 技术栈——实现日志、Grafana、链路追踪、指标的统一可观测性

Architecture Overview

架构概述

Core Components

核心组件

Component	Purpose
Distributor	Validates requests, routes incoming metrics to ingesters via hash ring
Ingester	Stores time-series data in memory, flushes to object storage
Querier	Executes PromQL queries from ingesters and store-gateways
Query Frontend	Caches query results, optimizes and splits queries
Query Scheduler	Manages per-tenant query queues for fairness
Store-Gateway	Provides access to historical metric blocks in object storage
Compactor	Consolidates and optimizes stored metric data blocks
Ruler	Evaluates recording and alerting rules (optional)
Alertmanager	Handles alert routing and deduplication (optional)

组件	用途
Distributor	验证请求，通过哈希环将传入的指标路由至Ingester
Ingester	将时序数据存储在内存中，定期刷新至对象存储
Querier	从Ingester和Store-Gateway执行PromQL查询
Query Frontend	缓存查询结果，优化并拆分查询请求
Query Scheduler	管理每个租户的查询队列，确保公平性
Store-Gateway	提供对对象存储中历史指标块的访问
Compactor	合并并优化存储的指标数据块
Ruler	评估记录规则与告警规则（可选组件）
Alertmanager	处理告警路由与去重（可选组件）

Data Flow

数据流程

Write Path:

Prometheus/OTel → Distributor → Ingester → Object Storage
                       ↓
                 Hash Ring
                 (routes by series)

Read Path:

Query → Query Frontend → Query Scheduler → Querier
                                              ↓
                                    Ingesters (recent)
                                              ↓
                                    Store-Gateway (historical)

写入流程：

Prometheus/OTel → Distributor → Ingester → Object Storage
                       ↓
                 Hash Ring
                 (按时间序列路由)

读取流程：

Query → Query Frontend → Query Scheduler → Querier
                                              ↓
                                    Ingesters (近期数据)
                                              ↓
                                    Store-Gateway (历史数据)

Deployment Modes

部署模式

1. Monolithic Mode (

-target=all

)

1. 单体模式（

-target=all

）

All components in single process
Best for: Development, testing, small-scale (~1M series)
Horizontally scalable by deploying multiple instances
Not recommended for large-scale (all components scale together)

所有组件运行在单个进程中
最佳适用场景：开发、测试、小规模部署（约100万条时间序列）
可通过部署多个实例实现水平扩展
不建议用于大规模部署（所有组件会同步扩展，无法按需调整）

2. Microservices Mode (Distributed) - Recommended for Production

2. 微服务模式（分布式）——生产环境推荐

yaml

undefined

yaml

undefined

Using mimir-distributed Helm chart

distributor: replicas: 3

ingester: replicas: 3 zoneAwareReplication: enabled: true

querier: replicas: 3

queryFrontend: replicas: 2

queryScheduler: replicas: 2

storeGateway: replicas: 3

compactor: replicas: 1

undefined

distributor: replicas: 3

ingester: replicas: 3 zoneAwareReplication: enabled: true

querier: replicas: 3

queryFrontend: replicas: 2

queryScheduler: replicas: 2

storeGateway: replicas: 3

compactor: replicas: 1

undefined

Helm Deployment

Helm部署

Add Repository

添加仓库

bash

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

bash

helm repo add grafana https://grafana.github.io/helm-charts
helm repo update

Install Distributed Mimir

安装分布式Mimir

bash

helm install mimir grafana/mimir-distributed \
  --namespace monitoring \
  --values values.yaml

bash

helm install mimir grafana/mimir-distributed \
  --namespace monitoring \
  --values values.yaml

Pre-Built Values Files

预构建的Values文件

File	Purpose
`values.yaml`	Non-production testing with MinIO
`small.yaml`	~1 million series (single replicas, not HA)
`large.yaml`	Production (~10 million series)

文件	用途
`values.yaml`	用于非生产环境测试，搭配MinIO使用
`small.yaml`	适用于约100万条时间序列（单实例，不具备高可用）
`large.yaml`	生产环境适用（约1000万条时间序列）

Production Values Example

生产环境Values示例

yaml

undefined

yaml

undefined

Deployment mode

mimir: structuredConfig: multitenancy_enabled: true

Storage configuration

mimir: structuredConfig: common: storage: backend: azure # or s3, gcs azure: account_name: ${AZURE_STORAGE_ACCOUNT} account_key: ${AZURE_STORAGE_KEY} endpoint_suffix: blob.core.windows.net

blocks_storage:
  azure:
    container_name: mimir-blocks

alertmanager_storage:
  azure:
    container_name: mimir-alertmanager

ruler_storage:
  azure:
    container_name: mimir-ruler

mimir: structuredConfig: common: storage: backend: azure # or s3, gcs azure: account_name: ${AZURE_STORAGE_ACCOUNT} account_key: ${AZURE_STORAGE_KEY} endpoint_suffix: blob.core.windows.net

blocks_storage:
  azure:
    container_name: mimir-blocks

alertmanager_storage:
  azure:
    container_name: mimir-alertmanager

ruler_storage:
  azure:
    container_name: mimir-ruler

Distributor

distributor: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 4Gi

Ingester

ingester: replicas: 3 zoneAwareReplication: enabled: true persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 2 memory: 8Gi limits: memory: 16Gi

Querier

querier: replicas: 3 resources: requests: cpu: 1 memory: 2Gi limits: memory: 8Gi

Query Frontend

query_frontend: replicas: 2 resources: requests: cpu: 500m memory: 1Gi limits: memory: 2Gi

Query Scheduler

query_scheduler: replicas: 2

Store Gateway

store_gateway: replicas: 3 persistentVolume: enabled: true size: 20Gi resources: requests: cpu: 500m memory: 2Gi limits: memory: 8Gi

Compactor

compactor: replicas: 1 persistentVolume: enabled: true size: 50Gi resources: requests: cpu: 1 memory: 4Gi limits: memory: 8Gi

Gateway for external access

gateway: enabledNonEnterprise: true replicas: 2

Monitoring

metaMonitoring: serviceMonitor: enabled: true

undefined

metaMonitoring: serviceMonitor: enabled: true

undefined

Storage Configuration

存储配置

Critical Requirements

关键要求

Must create buckets manually - Mimir doesn't create them
Separate buckets required - blocks_storage, alertmanager_storage, ruler_storage cannot share the same bucket+prefix
Azure: Hierarchical namespace must be disabled

必须手动创建存储桶——Mimir不会自动创建
需要使用独立的存储桶——blocks_storage、alertmanager_storage、ruler_storage不能共享同一个存储桶或前缀
Azure环境：必须禁用分层命名空间

Azure Blob Storage

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: azure
        azure:
          account_name: <storage-account-name>
          # Option 1: Account Key (via environment variable)
          account_key: ${AZURE_STORAGE_KEY}
          # Option 2: User-Assigned Managed Identity
          # user_assigned_id: <identity-client-id>
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: azure
        azure:
          account_name: <storage-account-name>
          # 选项1：账户密钥（通过环境变量传入）
          account_key: ${AZURE_STORAGE_KEY}
          # 选项2：用户分配的托管标识
          # user_assigned_id: <identity-client-id>
          endpoint_suffix: blob.core.windows.net

    blocks_storage:
      azure:
        container_name: mimir-blocks

    alertmanager_storage:
      azure:
        container_name: mimir-alertmanager

    ruler_storage:
      azure:
        container_name: mimir-ruler

AWS S3

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.us-east-1.amazonaws.com
          region: us-east-1
          access_key_id: ${AWS_ACCESS_KEY_ID}
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}

    blocks_storage:
      s3:
        bucket_name: mimir-blocks

    alertmanager_storage:
      s3:
        bucket_name: mimir-alertmanager

    ruler_storage:
      s3:
        bucket_name: mimir-ruler

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: s3
        s3:
          endpoint: s3.us-east-1.amazonaws.com
          region: us-east-1
          access_key_id: ${AWS_ACCESS_KEY_ID}
          secret_access_key: ${AWS_SECRET_ACCESS_KEY}

    blocks_storage:
      s3:
        bucket_name: mimir-blocks

    alertmanager_storage:
      s3:
        bucket_name: mimir-alertmanager

    ruler_storage:
      s3:
        bucket_name: mimir-ruler

Google Cloud Storage

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: gcs
        gcs:
          service_account: ${GCS_SERVICE_ACCOUNT_JSON}

    blocks_storage:
      gcs:
        bucket_name: mimir-blocks

    alertmanager_storage:
      gcs:
        bucket_name: mimir-alertmanager

    ruler_storage:
      gcs:
        bucket_name: mimir-ruler

yaml

mimir:
  structuredConfig:
    common:
      storage:
        backend: gcs
        gcs:
          service_account: ${GCS_SERVICE_ACCOUNT_JSON}

    blocks_storage:
      gcs:
        bucket_name: mimir-blocks

    alertmanager_storage:
      gcs:
        bucket_name: mimir-alertmanager

    ruler_storage:
      gcs:
        bucket_name: mimir-ruler

Limits Configuration

限制配置

yaml

mimir:
  structuredConfig:
    limits:
      # Ingestion limits
      ingestion_rate: 25000                    # Samples/sec per tenant
      ingestion_burst_size: 50000              # Burst size
      max_series_per_metric: 10000
      max_series_per_user: 1000000
      max_global_series_per_user: 1000000
      max_label_names_per_series: 30
      max_label_name_length: 1024
      max_label_value_length: 2048

      # Query limits
      max_fetched_series_per_query: 100000
      max_fetched_chunks_per_query: 2000000
      max_query_lookback: 0                    # No limit
      max_query_parallelism: 32

      # Retention
      compactor_blocks_retention_period: 365d  # 1 year

      # Out-of-order samples
      out_of_order_time_window: 5m

yaml

mimir:
  structuredConfig:
    limits:
      # 写入限制
      ingestion_rate: 25000                    # 每个租户的每秒样本数
      ingestion_burst_size: 50000              # 突发写入上限
      max_series_per_metric: 10000
      max_series_per_user: 1000000
      max_global_series_per_user: 1000000
      max_label_names_per_series: 30
      max_label_name_length: 1024
      max_label_value_length: 2048

      # 查询限制
      max_fetched_series_per_query: 100000
      max_fetched_chunks_per_query: 2000000
      max_query_lookback: 0                    # 无限制
      max_query_parallelism: 32

      # 数据保留
      compactor_blocks_retention_period: 365d  # 1年

      # 乱序样本
      out_of_order_time_window: 5m

Per-Tenant Overrides (Runtime Configuration)

租户级别的运行时覆盖配置

yaml

undefined

yaml

undefined

runtime-config.yaml

overrides: tenant1: ingestion_rate: 50000 max_series_per_user: 2000000 compactor_blocks_retention_period: 730d # 2 years tenant2: ingestion_rate: 75000 max_global_series_per_user: 5000000


Enable runtime configuration:

```yaml
mimir:
  structuredConfig:
    runtime_config:
      file: /etc/mimir/runtime-config.yaml
      period: 10s

overrides: tenant1: ingestion_rate: 50000 max_series_per_user: 2000000 compactor_blocks_retention_period: 730d # 2年 tenant2: ingestion_rate: 75000 max_global_series_per_user: 5000000


启用运行时配置：

```yaml
mimir:
  structuredConfig:
    runtime_config:
      file: /etc/mimir/runtime-config.yaml
      period: 10s

High Availability Configuration

高可用配置

HA Tracker for Prometheus Deduplication

用于Prometheus去重的HA Tracker

yaml

mimir:
  structuredConfig:
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: memberlist
        cluster_label: cluster
        replica_label: __replica__

    memberlist:
      join_members:
        - mimir-gossip-ring.monitoring.svc.cluster.local:7946

Prometheus Configuration:

yaml

global:
  external_labels:
    cluster: prom-team1
    __replica__: replica1

remote_write:
  - url: http://mimir-gateway:8080/api/v1/push
    headers:
      X-Scope-OrgID: my-tenant

yaml

mimir:
  structuredConfig:
    distributor:
      ha_tracker:
        enable_ha_tracker: true
        kvstore:
          store: memberlist
        cluster_label: cluster
        replica_label: __replica__

    memberlist:
      join_members:
        - mimir-gossip-ring.monitoring.svc.cluster.local:7946

Prometheus配置：

yaml

global:
  external_labels:
    cluster: prom-team1
    __replica__: replica1

remote_write:
  - url: http://mimir-gateway:8080/api/v1/push
    headers:
      X-Scope-OrgID: my-tenant

Zone-Aware Replication

区域感知复制

yaml

ingester:
  zoneAwareReplication:
    enabled: true
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1c

store_gateway:
  zoneAwareReplication:
    enabled: true

yaml

ingester:
  zoneAwareReplication:
    enabled: true
    zones:
      - name: zone-a
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1a
      - name: zone-b
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1b
      - name: zone-c
        nodeSelector:
          topology.kubernetes.io/zone: us-east-1c

store_gateway:
  zoneAwareReplication:
    enabled: true

Shuffle Sharding

随机分片

Limits tenant data to a subset of instances for fault isolation:

yaml

mimir:
  structuredConfig:
    limits:
      # Write path
      ingestion_tenant_shard_size: 3

      # Read path
      max_queriers_per_tenant: 5
      store_gateway_tenant_shard_size: 3

将租户数据限制在部分实例中，实现故障隔离：

yaml

mimir:
  structuredConfig:
    limits:
      # 写入路径
      ingestion_tenant_shard_size: 3

      # 读取路径
      max_queriers_per_tenant: 5
      store_gateway_tenant_shard_size: 3

OpenTelemetry Integration

OpenTelemetry集成

OTLP Metrics Ingestion

OTLP指标写入

OpenTelemetry Collector Config:

yaml

exporters:
  otlphttp:
    endpoint: http://mimir-gateway:8080/otlp
    headers:
      X-Scope-OrgID: "my-tenant"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

OpenTelemetry Collector配置：

yaml

exporters:
  otlphttp:
    endpoint: http://mimir-gateway:8080/otlp
    headers:
      X-Scope-OrgID: "my-tenant"

service:
  pipelines:
    metrics:
      receivers: [otlp]
      exporters: [otlphttp]

Exponential Histograms (Experimental)

指数直方图（实验性）

// Go SDK configuration
Aggregation: metric.AggregationBase2ExponentialHistogram{
    MaxSize:  160,      // Maximum buckets
    MaxScale: 20,       // Scale factor
}

Key Benefits:

Explicit min/max values (no estimation needed)
Better accuracy for extreme percentiles
Native OTLP format preservation

// Go SDK配置
Aggregation: metric.AggregationBase2ExponentialHistogram{
    MaxSize:  160,      // 最大桶数量
    MaxScale: 20,       // 缩放因子
}

核心优势：

明确的最小值/最大值（无需估算）
对极端百分位数的计算更准确
原生支持OTLP格式

Multi-Tenancy

多租户

yaml

mimir:
  structuredConfig:
    multitenancy_enabled: true
    no_auth_tenant: anonymous    # Used when multitenancy disabled

Query with tenant header:

bash

curl -H "X-Scope-OrgID: tenant-a" \
  "http://mimir:8080/prometheus/api/v1/query?query=up"

Tenant ID Constraints:

Max 150 characters
Allowed: alphanumeric,
```
!
```
```
-
```
```
_
```
```
.
```
```
*
```
```
'
```
```
(
```
```
)
```
Prohibited:
```
.
```
or
```
..
```
alone,
```
__mimir_cluster
```
, slashes

yaml

mimir:
  structuredConfig:
    multitenancy_enabled: true
    no_auth_tenant: anonymous    # 禁用多租户时使用的默认租户

携带租户请求头查询：

bash

curl -H "X-Scope-OrgID: tenant-a" \
  "http://mimir:8080/prometheus/api/v1/query?query=up"

租户ID约束：

最大长度150字符
允许使用：字母数字、
```
!
```
```
-
```
```
_
```
```
.
```
```
*
```
```
'
```
```
(
```
```
)
```
禁止使用：单独的
```
.
```
或
```
..
```
、
```
__mimir_cluster
```
、斜杠

API Reference

API参考

Ingestion Endpoints

写入端点

bash

undefined

bash

undefined

Prometheus remote write

Prometheus远程写入

POST /api/v1/push

OTLP metrics

OTLP指标

POST /otlp/v1/metrics

InfluxDB line protocol

InfluxDB行协议

POST /api/v1/push/influx/write

undefined

POST /api/v1/push/influx/write

undefined

Query Endpoints

查询端点

bash

undefined

bash

undefined

Instant query

即时查询

GET,POST /prometheus/api/v1/query?query=<promql>&time=<timestamp>

Range query

范围查询

GET,POST /prometheus/api/v1/query_range?query=<promql>&start=<start>&end=<end>&step=<step>

Labels

标签查询

GET,POST /prometheus/api/v1/labels GET /prometheus/api/v1/label/{name}/values

Series

时间序列查询

GET,POST /prometheus/api/v1/series

Exemplars

示例查询

GET,POST /prometheus/api/v1/query_exemplars

Cardinality

基数查询

GET,POST /prometheus/api/v1/cardinality/label_names GET,POST /prometheus/api/v1/cardinality/active_series

undefined

GET,POST /prometheus/api/v1/cardinality/label_names GET,POST /prometheus/api/v1/cardinality/active_series

undefined

Administrative Endpoints

管理端点

bash

undefined

bash

undefined

Flush ingester data

刷新Ingester数据

GET,POST /ingester/flush

Prepare shutdown

准备关机

GET,POST,DELETE /ingester/prepare-shutdown

Ring status

哈希环状态

GET /ingester/ring GET /distributor/ring GET /store-gateway/ring GET /compactor/ring

Tenant stats

租户统计

GET /distributor/all_user_stats GET /api/v1/user_stats GET /api/v1/user_limits

undefined

GET /distributor/all_user_stats GET /api/v1/user_stats GET /api/v1/user_limits

undefined

Health & Config

健康与配置检查

bash

GET /ready
GET /metrics
GET /config
GET /config?mode=diff
GET /runtime_config

bash

GET /ready
GET /metrics
GET /config
GET /config?mode=diff
GET /runtime_config

Azure Identity Configuration

Azure身份配置

User-Assigned Managed Identity

用户分配的托管标识

1. Create Identity:

bash

az identity create \
  --name mimir-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)

2. Assign to Node Pool:

bash

az vmss identity assign \
  --resource-group <aks-node-rg> \
  --name <vmss-name> \
  --identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity

3. Grant Storage Permission:

bash

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. Configure Mimir:

yaml

mimir:
  structuredConfig:
    common:
      storage:
        azure:
          user_assigned_id: <IDENTITY_CLIENT_ID>

1. 创建标识：

bash

az identity create \
  --name mimir-identity \
  --resource-group <rg>

IDENTITY_CLIENT_ID=$(az identity show --name mimir-identity --resource-group <rg> --query clientId -o tsv)
IDENTITY_PRINCIPAL_ID=$(az identity show --name mimir-identity --resource-group <rg> --query principalId -o tsv)

2. 分配至节点池：

bash

az vmss identity assign \
  --resource-group <aks-node-rg> \
  --name <vmss-name> \
  --identities /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/mimir-identity

3. 授予存储权限：

bash

az role assignment create \
  --role "Storage Blob Data Contributor" \
  --assignee-object-id $IDENTITY_PRINCIPAL_ID \
  --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

4. 配置Mimir：

yaml

mimir:
  structuredConfig:
    common:
      storage:
        azure:
          user_assigned_id: <IDENTITY_CLIENT_ID>

Workload Identity Federation

工作负载身份联邦

1. Create Federated Credential:

bash

az identity federated-credential create \
  --name mimir-federated \
  --identity-name mimir-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:mimir \
  --audiences api://AzureADTokenExchange

2. Configure Helm Values:

yaml

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

1. 创建联邦凭据：

bash

az identity federated-credential create \
  --name mimir-federated \
  --identity-name mimir-identity \
  --resource-group <rg> \
  --issuer <aks-oidc-issuer-url> \
  --subject system:serviceaccount:monitoring:mimir \
  --audiences api://AzureADTokenExchange

2. 配置Helm Values：

yaml

serviceAccount:
  annotations:
    azure.workload.identity/client-id: <IDENTITY_CLIENT_ID>

podLabels:
  azure.workload.identity/use: "true"

Troubleshooting

故障排查

Common Issues

常见问题

1. Container Not Found (Azure)

bash

undefined

1. 容器不存在（Azure环境）

bash

undefined

Create required containers

创建所需的容器

az storage container create --name mimir-blocks --account-name <storage> az storage container create --name mimir-alertmanager --account-name <storage> az storage container create --name mimir-ruler --account-name <storage>


**2. Authorization Failure (Azure)**

```bash


**2. 授权失败（Azure环境）**

```bash

Verify RBAC assignment

验证RBAC分配

az role assignment list --scope /subscriptions/<sub>/resourceGroups/<rg>/providers/Microsoft.Storage/storageAccounts/<storage>

Assign if missing

若缺失则分配权限

az role assignment create
--role "Storage Blob Data Contributor"
--assignee-object-id <principal-id>
--scope <storage-scope>

Restart pod to refresh token

重启Pod以刷新令牌

kubectl delete pod -n monitoring <ingester-pod>


**3. Ingester OOM**

```yaml
ingester:
  resources:
    limits:
      memory: 16Gi  # Increase memory

4. Query Timeout

yaml

mimir:
  structuredConfig:
    querier:
      timeout: 5m
      max_concurrent: 20

5. High Cardinality

yaml

mimir:
  structuredConfig:
    limits:
      max_series_per_user: 5000000
      max_series_per_metric: 50000

kubectl delete pod -n monitoring <ingester-pod>


**3. Ingester内存溢出（OOM）**

```yaml
ingester:
  resources:
    limits:
      memory: 16Gi  # 增加内存限制

4. 查询超时

yaml

mimir:
  structuredConfig:
    querier:
      timeout: 5m
      max_concurrent: 20

5. 高基数问题

yaml

mimir:
  structuredConfig:
    limits:
      max_series_per_user: 5000000
      max_series_per_metric: 50000

Diagnostic Commands

诊断命令

bash

undefined

bash

undefined

Check pod status

检查Pod状态

kubectl get pods -n monitoring -l app.kubernetes.io/name=mimir

Check ingester logs

查看Ingester日志

kubectl logs -n monitoring -l app.kubernetes.io/component=ingester --tail=100

Check distributor logs

查看Distributor日志

kubectl logs -n monitoring -l app.kubernetes.io/component=distributor --tail=100

Verify readiness

验证就绪状态

kubectl exec -it <mimir-pod> -n monitoring -- wget -qO- http://localhost:8080/ready

Check ring status

检查哈希环状态

kubectl port-forward svc/mimir-distributor 8080:8080 -n monitoring curl http://localhost:8080/distributor/ring

Check configuration

查看配置

kubectl exec -it <mimir-pod> -n monitoring -- cat /etc/mimir/mimir.yaml

Validate configuration before deployment

部署前验证配置

mimir -modules -config.file <path-to-config-file>

undefined

mimir -modules -config.file <path-to-config-file>

undefined

Key Metrics to Monitor

需监控的核心指标

promql

undefined

promql

undefined

Ingestion rate per tenant

每个租户的写入速率

sum by (user) (rate(cortex_distributor_received_samples_total[5m]))

Series count per tenant

每个租户的时间序列数量

sum by (user) (cortex_ingester_memory_series)

Query latency

查询延迟

histogram_quantile(0.99, sum by (le) (rate(cortex_request_duration_seconds_bucket{route=~"/api/prom/api/v1/query.*"}[5m])))

Compactor status

Compactor运行状态

cortex_compactor_runs_completed_total cortex_compactor_runs_failed_total

Store-gateway block sync

Store-gateway块同步状态

cortex_bucket_store_blocks_loaded

undefined

cortex_bucket_store_blocks_loaded

undefined

Circuit Breakers (Ingester)

熔断器（Ingester）

yaml

mimir:
  structuredConfig:
    ingester:
      push_circuit_breaker:
        enabled: true
        request_timeout: 2s
        failure_threshold_percentage: 10
        cooldown_period: 10s
      read_circuit_breaker:
        enabled: true
        request_timeout: 30s

States:

Closed - Normal operation
Open - Stops forwarding to failing instances
Half-open - Limited trial requests after cooldown

yaml

mimir:
  structuredConfig:
    ingester:
      push_circuit_breaker:
        enabled: true
        request_timeout: 2s
        failure_threshold_percentage: 10
        cooldown_period: 10s
      read_circuit_breaker:
        enabled: true
        request_timeout: 30s

状态说明：

关闭状态——正常运行
打开状态——停止向故障实例转发请求
半开状态——冷却期后允许有限的测试请求

mimir

Original

Translation

Grafana Mimir Skill

Grafana Mimir 完整指南

What is Mimir?

什么是Mimir？

Architecture Overview

架构概述

Core Components

核心组件

Data Flow

数据流程

Deployment Modes

部署模式

1. Monolithic Mode (-target=all)

1. 单体模式（-target=all）

2. Microservices Mode (Distributed) - Recommended for Production

2. 微服务模式（分布式）——生产环境推荐

Using mimir-distributed Helm chart

Using mimir-distributed Helm chart

Helm Deployment

Helm部署

Add Repository

添加仓库

Install Distributed Mimir

安装分布式Mimir

Pre-Built Values Files

预构建的Values文件

Production Values Example

生产环境Values示例

Deployment mode

Deployment mode

Storage configuration

Storage configuration

Distributor

Distributor

Ingester

Ingester

Querier

Querier

Query Frontend

Query Frontend

Query Scheduler

Query Scheduler

Store Gateway

Store Gateway

Compactor

Compactor

Gateway for external access

Gateway for external access

Monitoring

Monitoring

Storage Configuration

存储配置

Critical Requirements

关键要求

Azure Blob Storage

Azure Blob Storage

AWS S3

AWS S3

Google Cloud Storage

Google Cloud Storage

Limits Configuration

限制配置

Per-Tenant Overrides (Runtime Configuration)

租户级别的运行时覆盖配置

runtime-config.yaml

runtime-config.yaml

High Availability Configuration

高可用配置

HA Tracker for Prometheus Deduplication

用于Prometheus去重的HA Tracker

Zone-Aware Replication

区域感知复制

Shuffle Sharding

随机分片

OpenTelemetry Integration

OpenTelemetry集成

OTLP Metrics Ingestion

1. Monolithic Mode (
`-target=all`
)

1. 单体模式（
`-target=all`
）