loki-config-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Loki Configuration Generator

Loki配置生成器

Overview

概述

Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.
Current Stable: Loki 3.6.2 (November 2025) Important: Promtail deprecated in 3.4 - use Grafana Alloy instead. See
examples/grafana-alloy.yaml
for log collection configuration.
基于最佳实践生成生产可用的Grafana Loki服务端配置,支持单体、简易可扩展、微服务三种部署模式,适配S3、GCS、Azure或本地文件系统存储。
当前稳定版本: Loki 3.6.2(2025年11月) 重要提示: Promtail从3.4版本开始弃用,请改用Grafana Alloy。日志采集配置可参考
examples/grafana-alloy.yaml
文件。

When to Use

适用场景

Invoke when: deploying Loki, creating configs from scratch, migrating to Loki, implementing multi-tenant logging, configuring storage backends, or optimizing existing deployments.

以下场景可调用本工具:部署Loki、从零搭建配置、迁移到Loki、实现多租户日志系统、配置存储后端、优化现有Loki部署。

Generation Methods

生成方式

Method 1: Script Generation (Recommended)

方式1:脚本生成(推荐)

Use
scripts/generate_config.py
for consistent, validated configurations:
bash
undefined
使用
scripts/generate_config.py
生成一致、经过校验的配置:
bash
undefined

Simple Scalable with S3 (production)

适配S3的简易可扩展模式(生产环境)

python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml
python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml

Monolithic with filesystem (development)

适配本地文件系统的单体模式(开发环境)

python3 scripts/generate_config.py
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml
python3 scripts/generate_config.py
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml

Production with Thanos storage (Loki 3.4+)

适配Thanos存储的生产配置(Loki 3.4+)

python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml

**Script Options:**
| Option | Description |
|--------|-------------|
| `--mode` | monolithic, simple-scalable, microservices |
| `--storage` | filesystem, s3, gcs, azure |
| `--auth-enabled` / `--no-auth-enabled` | Explicitly enable/disable auth |
| `--otlp-enabled` | Enable OTLP ingestion configuration |
| `--thanos-storage` | Use Thanos object storage client (3.4+, cloud backends) |
| `--time-sharding` | Enable out-of-order ingestion (simple-scalable) |
| `--ruler` | Enable alerting/recording rules (not monolithic) |
| `--horizontal-compactor` | main/worker mode (simple-scalable, 3.6+) |
| `--zone-awareness` | Enable multi-AZ placement safeguards |
| `--limits-dry-run` | Log limit rejections without enforcing |
python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml

**脚本参数说明:**
| 参数 | 描述 |
|--------|-------------|
| `--mode` | 部署模式:monolithic(单体)、simple-scalable(简易可扩展)、microservices(微服务) |
| `--storage` | 存储类型:filesystem(本地文件)、s3、gcs、azure |
| `--auth-enabled` / `--no-auth-enabled` | 显式开启/关闭身份认证 |
| `--otlp-enabled` | 开启OTLP数据接入配置 |
| `--thanos-storage` | 使用Thanos对象存储客户端(3.4+版本支持,仅云存储后端可用) |
| `--time-sharding` | 开启乱序数据写入(仅简易可扩展模式可用) |
| `--ruler` | 开启告警/记录规则(单体模式不支持) |
| `--horizontal-compactor` | 主/从模式的水平扩展压缩器(简易可扩展模式,3.6+版本支持) |
| `--zone-awareness` | 开启多可用区部署容灾保障 |
| `--limits-dry-run` | 仅记录配额超限日志,不实际拦截请求 |

Method 2: Manual Configuration

方式2:手动配置

Follow the staged workflow below when script generation doesn't meet specific requirements or when learning the configuration structure.
当脚本生成无法满足特定需求,或需要学习配置结构时,可按照下文的分步流程手动配置。

Output Formats

输出格式

For Kubernetes deployments, generate BOTH formats:
  1. Native Loki config (
    loki-config.yaml
    ) - For ConfigMap or direct use
  2. Helm values (
    values.yaml
    ) - For Helm chart deployments
See
examples/kubernetes-helm-values.yaml
for Helm format.

针对Kubernetes部署,需要同时生成两种格式:
  1. 原生Loki配置
    loki-config.yaml
    ) - 用于ConfigMap或直接部署
  2. Helm values
    values.yaml
    ) - 用于Helm chart部署
Helm格式示例可参考
examples/kubernetes-helm-values.yaml
文件。

Documentation Lookup

文档查询说明

When to Use Context7/Web Search

何时需要使用Context7/网页搜索

REQUIRED - Use Context7 MCP for:
  • Configuring features from Loki 3.4+ (Thanos storage, time sharding)
  • Configuring features from Loki 3.6+ (horizontal compactor, enforced labels)
  • Bloom filter configuration (complex, experimental)
  • Custom OTLP attribute mappings beyond standard patterns
  • Troubleshooting configuration errors
OPTIONAL - Skip documentation lookup for:
  • Standard deployment modes (monolithic, simple-scalable)
  • Basic storage configuration (S3, GCS, Azure, filesystem)
  • Default limits and component settings
  • Configurations covered in
    references/
    directory
必须使用Context7 MCP查询的场景:
  • 配置Loki 3.4+版本新增特性(Thanos存储、时间分片)
  • 配置Loki 3.6+版本新增特性(水平扩展压缩器、强制标签)
  • 布隆过滤器配置(复杂、实验特性)
  • 超出标准模式的自定义OTLP属性映射
  • 配置错误排查
可跳过文档查询的场景:
  • 标准部署模式(单体、简易可扩展)
  • 基础存储配置(S3、GCS、Azure、本地文件)
  • 默认配额和组件设置
  • references/
    目录中已覆盖的配置

Context7 MCP (preferred)

Context7 MCP(优先使用)

resolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [component]
Example topics:
storage_config
,
limits_config
,
otlp
,
compactor
,
ruler
,
bloom
resolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [组件名]
示例主题:
storage_config
limits_config
otlp
compactor
ruler
bloom

Web Search Fallback

网页搜索备选方案

Use when Context7 unavailable:
"Grafana Loki 3.6 [component] configuration documentation site:grafana.com"

当Context7不可用时使用:
"Grafana Loki 3.6 [组件名] configuration documentation site:grafana.com"

Configuration Workflow

配置流程

Stage 1: Gather Requirements

阶段1:收集需求

Deployment Mode:
ModeScaleUse Case
Monolithic<100GB/dayTesting, development
Simple Scalable100GB-1TB/dayProduction
Microservices>1TB/dayLarge-scale, multi-tenant
Storage Backend: S3, GCS, Azure Blob, Filesystem, MinIO
Key Questions: Expected log volume? Retention period? Multi-tenancy needed? High availability requirements? Kubernetes deployment?
Ask the user directly if required information is missing.
部署模式选择:
模式规模适用场景
单体模式日写入量<100GB测试、开发环境
简易可扩展模式日写入量100GB-1TB生产环境
微服务模式日写入量>1TB大规模、多租户场景
存储后端可选: S3、GCS、Azure Blob、本地文件系统、MinIO
核心确认问题: 预期日志规模?数据保留周期?是否需要多租户?高可用要求?是否部署在Kubernetes上?
如果缺少必要信息,请直接询问用户。

Stage 2: Schema Configuration (CRITICAL)

阶段2:Schema配置(关键)

For all new deployments (Loki 2.9+), use TSDB with v13 schema:
yaml
schema_config:
  configs:
    - from: "2025-01-01"  # Use deployment date
      store: tsdb
      object_store: s3     # s3, gcs, azure, filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
Key: Schema cannot change after deployment without migration.
所有Loki 2.9+版本的新部署,都应使用TSDB + v13 schema:
yaml
schema_config:
  configs:
    - from: "2025-01-01"  # 替换为实际部署日期
      store: tsdb
      object_store: s3     # 可选:s3、gcs、azure、filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h
注意: Schema一旦部署后无法修改,必须通过迁移才能变更。

Stage 3: Storage Configuration

阶段3:存储配置

S3:
yaml
common:
  storage:
    s3:
      s3: s3://us-east-1/loki-bucket
      s3forcepathstyle: false
GCS:
gcs: { bucket_name: loki-bucket }
Azure:
azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} }
Filesystem:
filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }
S3配置:
yaml
common:
  storage:
    s3:
      s3: s3://us-east-1/loki-bucket
      s3forcepathstyle: false
GCS配置:
gcs: { bucket_name: loki-bucket }
Azure配置:
azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} }
本地文件系统配置:
filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }

Stage 4: Component Configuration

阶段4:组件配置

Ingester:
yaml
ingester:
  chunk_encoding: snappy
  chunk_idle_period: 30m
  max_chunk_age: 2h
  chunk_target_size: 1572864  # 1.5MB
  lifecycler:
    ring:
      replication_factor: 3  # 3 for production
Querier:
yaml
querier:
  max_concurrent: 4
  query_timeout: 1m
Compactor:
yaml
compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h
Ingester配置:
yaml
ingester:
  chunk_encoding: snappy
  chunk_idle_period: 30m
  max_chunk_age: 2h
  chunk_target_size: 1572864  # 1.5MB
  lifecycler:
    ring:
      replication_factor: 3  # 生产环境设为3
Querier配置:
yaml
querier:
  max_concurrent: 4
  query_timeout: 1m
Compactor配置:
yaml
compactor:
  working_directory: /loki/compactor
  compaction_interval: 10m
  retention_enabled: true
  retention_delete_delay: 2h

Stage 5: Limits Configuration

阶段5:配额配置

yaml
limits_config:
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_entries_limit_per_query: 5000
  max_query_length: 721h
  retention_period: 30d
  allow_structured_metadata: true
  volume_enabled: true
yaml
limits_config:
  ingestion_rate_mb: 10
  ingestion_burst_size_mb: 20
  max_streams_per_user: 10000
  max_entries_limit_per_query: 5000
  max_query_length: 721h
  retention_period: 30d
  allow_structured_metadata: true
  volume_enabled: true

Stage 6: Server & Auth

阶段6:服务端与认证配置

yaml
server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

auth_enabled: true  # false for single-tenant
yaml
server:
  http_listen_port: 3100
  grpc_listen_port: 9096
  log_level: info

auth_enabled: true  # 单租户场景设为false

Stage 7: OTLP Ingestion (Loki 3.0+)

阶段7:OTLP接入配置(Loki 3.0+)

Native OpenTelemetry ingestion - use
otlphttp
exporter (NOT deprecated
lokiexporter
):
yaml
limits_config:
  allow_structured_metadata: true
  otlp_config:
    resource_attributes:
      attributes_config:
        - action: index_label  # Low-cardinality only!
          attributes: [service.name, service.namespace, deployment.environment]
        - action: structured_metadata  # High-cardinality
          attributes: [k8s.pod.name, service.instance.id]
Actions:
index_label
(searchable, low-cardinality),
structured_metadata
(queryable),
drop
⚠️ NEVER use
k8s.pod.name
as index_label
- use structured_metadata instead.
OTel Collector:
yaml
exporters:
  otlphttp:
    endpoint: http://loki:3100/otlp
原生OpenTelemetry数据接入——使用
otlphttp
exporter(不要用已弃用的
lokiexporter
):
yaml
limits_config:
  allow_structured_metadata: true
  otlp_config:
    resource_attributes:
      attributes_config:
        - action: index_label  # 仅低基数属性可设为index_label!
          attributes: [service.name, service.namespace, deployment.environment]
        - action: structured_metadata  # 高基数属性设为structured_metadata
          attributes: [k8s.pod.name, service.instance.id]
可选动作:
index_label
(可搜索,低基数)、
structured_metadata
(可查询)、
drop
(丢弃)
⚠️ 绝对不要将
k8s.pod.name
设为index_label
,应该归类为structured_metadata。
OTel Collector配置:
yaml
exporters:
  otlphttp:
    endpoint: http://loki:3100/otlp

Stage 8: Caching

阶段8:缓存配置

yaml
chunk_store_config:
  chunk_cache_config:
    memcached_client:
      host: memcached-chunks
      timeout: 500ms

query_range:
  cache_results: true
  results_cache:
    cache:
      memcached_client:
        host: memcached-results
yaml
chunk_store_config:
  chunk_cache_config:
    memcached_client:
      host: memcached-chunks
      timeout: 500ms

query_range:
  cache_results: true
  results_cache:
    cache:
      memcached_client:
        host: memcached-results

Stage 9: Advanced Features

阶段9:高级特性配置

Pattern Ingester (3.0+):
yaml
pattern_ingester:
  enabled: true
Bloom Filters (Experimental, 3.3+): Only for >75TB/month deployments. Works on structured metadata only. See examples/ for config.
Time Sharding (3.4+): For out-of-order ingestion:
yaml
limits_config:
  shard_streams:
    time_sharding_enabled: true
Thanos Storage (3.4+): New storage client, opt-in now, default later:
yaml
storage_config:
  use_thanos_objstore: true
  object_store:
    s3:
      bucket_name: my-bucket
      endpoint: s3.us-west-2.amazonaws.com
模式Ingester(3.0+):
yaml
pattern_ingester:
  enabled: true
布隆过滤器(实验特性,3.3+): 仅适用于月写入量>75TB的部署,仅支持structured_metadata属性,配置示例可参考examples/目录。
时间分片(3.4+): 支持乱序数据写入:
yaml
limits_config:
  shard_streams:
    time_sharding_enabled: true
Thanos存储(3.4+): 新版存储客户端,当前为可选配置,后续会成为默认:
yaml
storage_config:
  use_thanos_objstore: true
  object_store:
    s3:
      bucket_name: my-bucket
      endpoint: s3.us-west-2.amazonaws.com

Stage 10: Ruler (Alerting)

阶段10:Ruler(告警)配置

yaml
ruler:
  storage:
    type: s3
    s3: { bucket_name: loki-ruler }
  alertmanager_url: http://alertmanager:9093
  enable_api: true
  enable_sharding: true
yaml
ruler:
  storage:
    type: s3
    s3: { bucket_name: loki-ruler }
  alertmanager_url: http://alertmanager:9093
  enable_api: true
  enable_sharding: true

Stage 11: Loki 3.6 Features

阶段11:Loki 3.6新特性配置

  • Horizontally Scalable Compactor:
    horizontal_scaling_mode: main|worker
  • Policy-Based Enforced Labels:
    enforced_labels: [service.name]
  • FluentBit v4:
    structured_metadata
    parameter support
  • 水平可扩展Compactor:
    horizontal_scaling_mode: main|worker
  • 基于策略的强制标签:
    enforced_labels: [service.name]
  • FluentBit v4支持: 支持
    structured_metadata
    参数

Stage 12: Validate Configuration (REQUIRED)

阶段12:配置校验(必须执行)

Always validate before deployment:
bash
undefined
部署前必须进行校验:
bash
undefined

Syntax and parameter validation

语法和参数校验

loki -config.file=loki-config.yaml -verify-config
loki -config.file=loki-config.yaml -verify-config

Print resolved configuration (shows defaults)

打印完整解析后的配置(包含默认值)

loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100

Dry-run with Docker (if Loki not installed locally)

本地未安装Loki时用Docker执行 dry-run

docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config

**Validation Checklist:**
- [ ] No syntax errors from `-verify-config`
- [ ] Schema uses `tsdb` and `v13`
- [ ] `replication_factor: 3` for production
- [ ] `auth_enabled: true` if multi-tenant
- [ ] Storage credentials/IAM configured
- [ ] Retention period matches requirements

---
docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config

**校验清单:**
- [ ] `-verify-config` 无语法错误
- [ ] Schema使用`tsdb`和`v13`
- [ ] 生产环境`replication_factor: 3`
- [ ] 多租户场景`auth_enabled: true`
- [ ] 存储凭证/IAM已正确配置
- [ ] 数据保留周期符合需求

---

Production Checklist

生产环境检查清单

High Availability Requirements

高可用要求

Zone-Aware Replication (CRITICAL for production multi-AZ deployments):
When using
replication_factor: 3
, ALWAYS enable zone-awareness for multi-AZ deployments:
yaml
ingester:
  lifecycler:
    ring:
      replication_factor: 3
      zone_awareness_enabled: true  # CRITICAL for multi-AZ
可用区感知复制(多可用区生产部署的关键配置):
当设置
replication_factor: 3
时,多可用区部署必须开启可用区感知:
yaml
ingester:
  lifecycler:
    ring:
      replication_factor: 3
      zone_awareness_enabled: true  # 多可用区部署必须开启

Set zone via environment variable or config

通过环境变量或配置设置可用区

Each pod should set its zone based on node topology

每个Pod应根据节点拓扑设置自身可用区

common: instance_availability_zone: ${AVAILABILITY_ZONE}

**Why:** Without zone-awareness, all 3 replicas may land in the same AZ. If that AZ fails, you lose data.

**Kubernetes Implementation:**
```yaml
common: instance_availability_zone: ${AVAILABILITY_ZONE}

**原因:** 不开启可用区感知的话,3个副本可能都调度到同一个可用区,如果该可用区故障会导致数据丢失。

**Kubernetes实现方式:**
```yaml

In Helm values or pod spec

在Helm values或Pod spec中配置

env:
  • name: AVAILABILITY_ZONE valueFrom: fieldRef: fieldPath: metadata.labels['topology.kubernetes.io/zone']
undefined
env:
  • name: AVAILABILITY_ZONE valueFrom: fieldRef: fieldPath: metadata.labels['topology.kubernetes.io/zone']
undefined

TLS Configuration (Production Required)

TLS配置(生产环境必须开启)

Enable TLS for all inter-component and client communication:
yaml
server:
  http_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt  # For mTLS
  grpc_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt
See
examples/production-tls.yaml
for complete TLS configuration.
所有组件间和客户端通信都要开启TLS:
yaml
server:
  http_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt  # 双向TLS认证使用
  grpc_tls_config:
    cert_file: /etc/loki/tls/tls.crt
    key_file: /etc/loki/tls/tls.key
    client_ca_file: /etc/loki/tls/ca.crt
完整TLS配置可参考
examples/production-tls.yaml
文件。

Production Checklist Summary

生产检查清单汇总

RequirementSettingRequired For
replication_factor: 3
common blockAll production
zone_awareness_enabled: true
ingester.lifecycler.ringMulti-AZ
auth_enabled: true
root levelMulti-tenant
TLS enabledserver blockAll production
IAM roles (not keys)storage configCloud storage
Caching enabledchunk_store_config, query_rangePerformance
Pattern ingesterpattern_ingester.enabledObservability
Retention configuredcompactor + limits_configCost control

要求配置位置适用场景
replication_factor: 3
common块所有生产环境
zone_awareness_enabled: true
ingester.lifecycler.ring多可用区部署
auth_enabled: true
根层级多租户场景
开启TLSserver块所有生产环境
使用IAM角色(而非密钥)存储配置云存储场景
开启缓存chunk_store_config、query_range性能优化
开启模式Ingesterpattern_ingester.enabled可观测性提升
配置数据保留周期compactor + limits_config成本控制

Monitoring Recommendations

监控建议

Key Metrics to Monitor

核心监控指标

Configure Prometheus to scrape Loki metrics and alert on these critical indicators:
yaml
undefined
配置Prometheus采集Loki指标,并对以下关键指标设置告警:
yaml
undefined

Prometheus scrape config

Prometheus采集配置

  • job_name: 'loki' static_configs:
    • targets: ['loki:3100']
undefined
  • job_name: 'loki' static_configs:
    • targets: ['loki:3100']
undefined

Critical Alerts

关键告警规则

yaml
groups:
  - name: loki-critical
    rules:
      # Ingestion failures
      - alert: LokiIngestionFailures
        expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki ingestion failures detected"

      # High stream cardinality (performance killer)
      - alert: LokiHighStreamCardinality
        expr: loki_ingester_memory_streams > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High stream cardinality - review labels"

      # Compaction not running (retention broken)
      - alert: LokiCompactionStalled
        expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki compaction stalled - retention not enforced"

      # Query latency
      - alert: LokiSlowQueries
        expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki query P99 latency > 30s"

      # Ingester memory pressure
      - alert: LokiIngesterMemoryHigh
        expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki ingester memory usage > 80%"
yaml
groups:
  - name: loki-critical
    rules:
      # 数据写入失败
      - alert: LokiIngestionFailures
        expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "检测到Loki写入失败"

      # 流基数过高(严重影响性能)
      - alert: LokiHighStreamCardinality
        expr: loki_ingester_memory_streams > 100000
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "流基数过高,请检查标签配置"

      # 压缩任务停止(数据保留失效)
      - alert: LokiCompactionStalled
        expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Loki压缩任务停滞,数据保留策略未生效"

      # 查询延迟过高
      - alert: LokiSlowQueries
        expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki查询P99延迟超过30秒"

      # Ingester内存压力过高
      - alert: LokiIngesterMemoryHigh
        expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Loki ingester内存使用率超过80%"

Key Metrics Reference

核心指标参考

MetricDescriptionAction Threshold
loki_ingester_memory_streams
Active streams in memory>100k: review cardinality
loki_distributor_ingester_append_failures_total
Ingestion failures>0: investigate immediately
loki_request_duration_seconds
Query latencyP99 >30s: add caching/queriers
loki_ingester_chunks_flushed_total
Chunk flush rateLow rate: check ingester health
loki_compactor_last_successful_run_timestamp_seconds
Last compaction>2h ago: compaction broken
指标描述告警阈值
loki_ingester_memory_streams
内存中活跃流数量>10万:检查基数
loki_distributor_ingester_append_failures_total
写入失败次数>0:立即排查
loki_request_duration_seconds
查询延迟P99>30秒:增加缓存/Querier实例
loki_ingester_chunks_flushed_total
Chunk刷盘速率速率过低:检查Ingester健康状态
loki_compactor_last_successful_run_timestamp_seconds
上次成功压缩时间超过2小时:压缩任务故障

Grafana Dashboard

Grafana仪表盘

Import official Loki dashboards:
  • Dashboard ID:
    13407
    - Loki Logs
  • Dashboard ID:
    14055
    - Loki Operational

导入官方Loki仪表盘:
  • 仪表盘ID:
    13407
    - Loki日志
  • 仪表盘ID:
    14055
    - Loki运营监控

Log Collection with Grafana Alloy

使用Grafana Alloy采集日志

Promtail is deprecated (support ends Feb 2026). Use Grafana Alloy for new deployments.
Promtail已弃用(支持到2026年2月),新部署请使用Grafana Alloy。

Basic Alloy Configuration

基础Alloy配置

See
examples/grafana-alloy.yaml
for complete configuration.
alloy
// Kubernetes log discovery
discovery.kubernetes "pods" {
  role = "pod"
}

// Relabeling for Kubernetes metadata
discovery.relabel "pods" {
  targets = discovery.kubernetes.pods.targets

  rule {
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

// Log collection
loki.source.kubernetes "pods" {
  targets    = discovery.relabel.pods.output
  forward_to = [loki.write.default.receiver]
}

// Send to Loki
loki.write "default" {
  endpoint {
    url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"

    // For multi-tenant
    tenant_id = "default"
  }
}
完整配置可参考
examples/grafana-alloy.yaml
文件。
alloy
// Kubernetes日志发现
discovery.kubernetes "pods" {
  role = "pod"
}

// Kubernetes元数据重标记
discovery.relabel "pods" {
  targets = discovery.kubernetes.pods.targets

  rule {
    source_labels = ["__meta_kubernetes_namespace"]
    target_label  = "namespace"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_name"]
    target_label  = "pod"
  }
  rule {
    source_labels = ["__meta_kubernetes_pod_container_name"]
    target_label  = "container"
  }
}

// 日志采集
loki.source.kubernetes "pods" {
  targets    = discovery.relabel.pods.output
  forward_to = [loki.write.default.receiver]
}

// 发送到Loki
loki.write "default" {
  endpoint {
    url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"

    // 多租户场景配置
    tenant_id = "default"
  }
}

Migration from Promtail

从Promtail迁移

bash
undefined
bash
undefined

Convert Promtail config to Alloy

将Promtail配置转换为Alloy配置

alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml

---
alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml

---

Complete Examples

完整示例

See
examples/
directory for full configurations:
  • monolithic-filesystem.yaml
    - Development/testing
  • simple-scalable-s3.yaml
    - Production with S3
  • microservices-s3.yaml
    - Large-scale distributed
  • multi-tenant.yaml
    - Multi-tenant with per-tenant limits
  • production-tls.yaml
    - TLS-enabled production config
  • grafana-alloy.yaml
    - Log collection with Alloy
  • kubernetes-helm-values.yaml
    - Helm chart values
Minimal Monolithic:
yaml
auth_enabled: false
server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2025-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

limits_config:
  retention_period: 30d
  allow_structured_metadata: true

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

examples/
目录下提供完整配置示例:
  • monolithic-filesystem.yaml
    - 开发/测试环境单体配置
  • simple-scalable-s3.yaml
    - 适配S3的生产环境配置
  • microservices-s3.yaml
    - 大规模分布式部署配置
  • multi-tenant.yaml
    - 支持单租户配额的多租户配置
  • production-tls.yaml
    - 开启TLS的生产配置
  • grafana-alloy.yaml
    - Alloy日志采集配置
  • kubernetes-helm-values.yaml
    - Helm chart部署values
最简单体配置:
yaml
auth_enabled: false
server:
  http_listen_port: 3100

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2025-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: loki_index_
        period: 24h

limits_config:
  retention_period: 30d
  allow_structured_metadata: true

compactor:
  working_directory: /loki/compactor
  retention_enabled: true

Helm Deployment

Helm部署

bash
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yaml
Generate both native config and Helm values for Kubernetes deployments.
yaml
undefined
bash
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yaml
Kubernetes部署请同时生成原生配置和Helm values。
yaml
undefined

values.yaml

values.yaml

deploymentMode: SimpleScalable
loki: schemaConfig: configs: - from: "2025-01-01" store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h limits_config: retention_period: 30d allow_structured_metadata: true

Zone awareness for HA

ingester: lifecycler: ring: zone_awareness_enabled: true
backend: replicas: 3

Spread across zones

topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule read: replicas: 3 write: replicas: 3

---
deploymentMode: SimpleScalable
loki: schemaConfig: configs: - from: "2025-01-01" store: tsdb object_store: s3 schema: v13 index: prefix: loki_index_ period: 24h limits_config: retention_period: 30d allow_structured_metadata: true

高可用可用区感知配置

ingester: lifecycler: ring: zone_awareness_enabled: true
backend: replicas: 3

跨可用区调度

topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule read: replicas: 3 write: replicas: 3

---

Best Practices

最佳实践

Performance:
  • chunk_encoding: snappy
    ,
    chunk_target_size: 1572864
  • Enable caching (chunks, results)
  • parallelise_shardable_queries: true
Security:
  • auth_enabled: true
    with reverse proxy auth
  • IAM roles for cloud storage (never hardcode keys)
  • TLS for all communications (see Production Checklist)
Reliability:
  • replication_factor: 3
    for production
  • zone_awareness_enabled: true
    for multi-AZ (see Production Checklist)
  • Persistent volumes for ingesters
  • Monitor ingestion rate and query latency (see Monitoring section)
Limits: Set
ingestion_rate_mb
,
max_streams_per_user
to prevent overload

性能优化:
  • 配置
    chunk_encoding: snappy
    chunk_target_size: 1572864
  • 开启缓存(chunk、查询结果)
  • 配置
    parallelise_shardable_queries: true
安全建议:
  • 开启
    auth_enabled: true
    配合反向代理认证
  • 云存储使用IAM角色(绝对不要硬编码密钥)
  • 所有通信开启TLS(参考生产检查清单)
可靠性建议:
  • 生产环境配置
    replication_factor: 3
  • 多可用区部署开启
    zone_awareness_enabled: true
    (参考生产检查清单)
  • Ingester使用持久化卷
  • 监控写入速率和查询延迟(参考监控章节)
配额配置: 合理设置
ingestion_rate_mb
max_streams_per_user
防止系统过载

Common Issues

常见问题

IssueSolution
High ingester memoryReduce
max_streams_per_user
, lower
chunk_idle_period
Slow queriesIncrease
max_concurrent
, enable parallelization, add caching
Ingestion failuresCheck
ingestion_rate_mb
, verify storage connectivity
Storage growing fastEnable retention, check compression, review cardinality
Data loss in AZ failureEnable
zone_awareness_enabled: true
Config validation failsRun
loki -verify-config
, check YAML syntax

问题解决方案
Ingester内存占用过高降低
max_streams_per_user
,缩短
chunk_idle_period
查询缓慢提高
max_concurrent
,开启查询并行,增加缓存
写入失败检查
ingestion_rate_mb
配置,验证存储连通性
存储容量增长过快开启数据保留,检查压缩配置,优化基数
可用区故障导致数据丢失开启
zone_awareness_enabled: true
配置校验失败执行
loki -verify-config
,检查YAML语法

Deprecated (Migrate Away)

已弃用特性(请迁移)

  • boltdb-shipper
    tsdb
  • lokiexporter
    otlphttp
  • Promtail → Grafana Alloy (support ends Feb 2026)

  • boltdb-shipper
    → 替换为
    tsdb
  • lokiexporter
    → 替换为
    otlphttp
  • Promtail → 替换为Grafana Alloy(2026年2月停止支持)

Resources

资源

scripts/generate_config.py - Generate configs programmatically (RECOMMENDED) examples/ - Complete configuration examples for all modes references/ - Full parameter reference and best practices
scripts/generate_config.py - 程序化生成配置(推荐) examples/ - 全场景完整配置示例 references/ - 完整参数参考和最佳实践

Related Skills

关联技能

  • logql-generator - LogQL query generation
  • fluentbit-generator - Log collection to Loki
  • logql-generator - LogQL查询生成
  • fluentbit-generator - 采集日志到Loki