loki-config-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLoki Configuration Generator
Loki配置生成器
Overview
概述
Generate production-ready Grafana Loki server configurations with best practices. Supports monolithic, simple scalable, and microservices deployment modes with S3, GCS, Azure, or filesystem storage.
Current Stable: Loki 3.6.2 (November 2025) Important: Promtail deprecated in 3.4 - use Grafana Alloy instead. Seefor log collection configuration.examples/grafana-alloy.yaml
基于最佳实践生成生产可用的Grafana Loki服务端配置,支持单体、简易可扩展、微服务三种部署模式,适配S3、GCS、Azure或本地文件系统存储。
当前稳定版本: Loki 3.6.2(2025年11月) 重要提示: Promtail从3.4版本开始弃用,请改用Grafana Alloy。日志采集配置可参考文件。examples/grafana-alloy.yaml
When to Use
适用场景
Invoke when: deploying Loki, creating configs from scratch, migrating to Loki, implementing multi-tenant logging, configuring storage backends, or optimizing existing deployments.
以下场景可调用本工具:部署Loki、从零搭建配置、迁移到Loki、实现多租户日志系统、配置存储后端、优化现有Loki部署。
Generation Methods
生成方式
Method 1: Script Generation (Recommended)
方式1:脚本生成(推荐)
Use for consistent, validated configurations:
scripts/generate_config.pybash
undefined使用生成一致、经过校验的配置:
scripts/generate_config.pybash
undefinedSimple Scalable with S3 (production)
适配S3的简易可扩展模式(生产环境)
python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml
python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml
--mode simple-scalable
--storage s3
--bucket my-loki-bucket
--region us-east-1
--retention-days 30
--otlp-enabled
--output loki-config.yaml
Monolithic with filesystem (development)
适配本地文件系统的单体模式(开发环境)
python3 scripts/generate_config.py
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml
python3 scripts/generate_config.py
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml
--mode monolithic
--storage filesystem
--no-auth-enabled
--output loki-dev.yaml
Production with Thanos storage (Loki 3.4+)
适配Thanos存储的生产配置(Loki 3.4+)
python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml
**Script Options:**
| Option | Description |
|--------|-------------|
| `--mode` | monolithic, simple-scalable, microservices |
| `--storage` | filesystem, s3, gcs, azure |
| `--auth-enabled` / `--no-auth-enabled` | Explicitly enable/disable auth |
| `--otlp-enabled` | Enable OTLP ingestion configuration |
| `--thanos-storage` | Use Thanos object storage client (3.4+, cloud backends) |
| `--time-sharding` | Enable out-of-order ingestion (simple-scalable) |
| `--ruler` | Enable alerting/recording rules (not monolithic) |
| `--horizontal-compactor` | main/worker mode (simple-scalable, 3.6+) |
| `--zone-awareness` | Enable multi-AZ placement safeguards |
| `--limits-dry-run` | Log limit rejections without enforcing |python3 scripts/generate_config.py
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml
--mode simple-scalable
--storage s3
--thanos-storage
--otlp-enabled
--time-sharding
--output loki-thanos.yaml
**脚本参数说明:**
| 参数 | 描述 |
|--------|-------------|
| `--mode` | 部署模式:monolithic(单体)、simple-scalable(简易可扩展)、microservices(微服务) |
| `--storage` | 存储类型:filesystem(本地文件)、s3、gcs、azure |
| `--auth-enabled` / `--no-auth-enabled` | 显式开启/关闭身份认证 |
| `--otlp-enabled` | 开启OTLP数据接入配置 |
| `--thanos-storage` | 使用Thanos对象存储客户端(3.4+版本支持,仅云存储后端可用) |
| `--time-sharding` | 开启乱序数据写入(仅简易可扩展模式可用) |
| `--ruler` | 开启告警/记录规则(单体模式不支持) |
| `--horizontal-compactor` | 主/从模式的水平扩展压缩器(简易可扩展模式,3.6+版本支持) |
| `--zone-awareness` | 开启多可用区部署容灾保障 |
| `--limits-dry-run` | 仅记录配额超限日志,不实际拦截请求 |Method 2: Manual Configuration
方式2:手动配置
Follow the staged workflow below when script generation doesn't meet specific requirements or when learning the configuration structure.
当脚本生成无法满足特定需求,或需要学习配置结构时,可按照下文的分步流程手动配置。
Output Formats
输出格式
For Kubernetes deployments, generate BOTH formats:
- Native Loki config () - For ConfigMap or direct use
loki-config.yaml - Helm values () - For Helm chart deployments
values.yaml
See for Helm format.
examples/kubernetes-helm-values.yaml针对Kubernetes部署,需要同时生成两种格式:
- 原生Loki配置() - 用于ConfigMap或直接部署
loki-config.yaml - Helm values() - 用于Helm chart部署
values.yaml
Helm格式示例可参考文件。
examples/kubernetes-helm-values.yamlDocumentation Lookup
文档查询说明
When to Use Context7/Web Search
何时需要使用Context7/网页搜索
REQUIRED - Use Context7 MCP for:
- Configuring features from Loki 3.4+ (Thanos storage, time sharding)
- Configuring features from Loki 3.6+ (horizontal compactor, enforced labels)
- Bloom filter configuration (complex, experimental)
- Custom OTLP attribute mappings beyond standard patterns
- Troubleshooting configuration errors
OPTIONAL - Skip documentation lookup for:
- Standard deployment modes (monolithic, simple-scalable)
- Basic storage configuration (S3, GCS, Azure, filesystem)
- Default limits and component settings
- Configurations covered in directory
references/
必须使用Context7 MCP查询的场景:
- 配置Loki 3.4+版本新增特性(Thanos存储、时间分片)
- 配置Loki 3.6+版本新增特性(水平扩展压缩器、强制标签)
- 布隆过滤器配置(复杂、实验特性)
- 超出标准模式的自定义OTLP属性映射
- 配置错误排查
可跳过文档查询的场景:
- 标准部署模式(单体、简易可扩展)
- 基础存储配置(S3、GCS、Azure、本地文件)
- 默认配额和组件设置
- 目录中已覆盖的配置
references/
Context7 MCP (preferred)
Context7 MCP(优先使用)
resolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [component]Example topics: , , , , ,
storage_configlimits_configotlpcompactorrulerbloomresolve-library-id: "grafana loki"
get-library-docs: /websites/grafana_loki, topic: [组件名]示例主题: 、、、、、
storage_configlimits_configotlpcompactorrulerbloomWeb Search Fallback
网页搜索备选方案
Use when Context7 unavailable:
"Grafana Loki 3.6 [component] configuration documentation site:grafana.com"当Context7不可用时使用:
"Grafana Loki 3.6 [组件名] configuration documentation site:grafana.com"Configuration Workflow
配置流程
Stage 1: Gather Requirements
阶段1:收集需求
Deployment Mode:
| Mode | Scale | Use Case |
|---|---|---|
| Monolithic | <100GB/day | Testing, development |
| Simple Scalable | 100GB-1TB/day | Production |
| Microservices | >1TB/day | Large-scale, multi-tenant |
Storage Backend: S3, GCS, Azure Blob, Filesystem, MinIO
Key Questions: Expected log volume? Retention period? Multi-tenancy needed? High availability requirements? Kubernetes deployment?
Ask the user directly if required information is missing.
部署模式选择:
| 模式 | 规模 | 适用场景 |
|---|---|---|
| 单体模式 | 日写入量<100GB | 测试、开发环境 |
| 简易可扩展模式 | 日写入量100GB-1TB | 生产环境 |
| 微服务模式 | 日写入量>1TB | 大规模、多租户场景 |
存储后端可选: S3、GCS、Azure Blob、本地文件系统、MinIO
核心确认问题: 预期日志规模?数据保留周期?是否需要多租户?高可用要求?是否部署在Kubernetes上?
如果缺少必要信息,请直接询问用户。
Stage 2: Schema Configuration (CRITICAL)
阶段2:Schema配置(关键)
For all new deployments (Loki 2.9+), use TSDB with v13 schema:
yaml
schema_config:
configs:
- from: "2025-01-01" # Use deployment date
store: tsdb
object_store: s3 # s3, gcs, azure, filesystem
schema: v13
index:
prefix: loki_index_
period: 24hKey: Schema cannot change after deployment without migration.
所有Loki 2.9+版本的新部署,都应使用TSDB + v13 schema:
yaml
schema_config:
configs:
- from: "2025-01-01" # 替换为实际部署日期
store: tsdb
object_store: s3 # 可选:s3、gcs、azure、filesystem
schema: v13
index:
prefix: loki_index_
period: 24h注意: Schema一旦部署后无法修改,必须通过迁移才能变更。
Stage 3: Storage Configuration
阶段3:存储配置
S3:
yaml
common:
storage:
s3:
s3: s3://us-east-1/loki-bucket
s3forcepathstyle: falseGCS:
Azure:
Filesystem:
gcs: { bucket_name: loki-bucket }azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} }filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }S3配置:
yaml
common:
storage:
s3:
s3: s3://us-east-1/loki-bucket
s3forcepathstyle: falseGCS配置:
Azure配置:
本地文件系统配置:
gcs: { bucket_name: loki-bucket }azure: { container_name: loki-container, account_name: ${AZURE_ACCOUNT_NAME} }filesystem: { chunks_directory: /loki/chunks, rules_directory: /loki/rules }Stage 4: Component Configuration
阶段4:组件配置
Ingester:
yaml
ingester:
chunk_encoding: snappy
chunk_idle_period: 30m
max_chunk_age: 2h
chunk_target_size: 1572864 # 1.5MB
lifecycler:
ring:
replication_factor: 3 # 3 for productionQuerier:
yaml
querier:
max_concurrent: 4
query_timeout: 1mCompactor:
yaml
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2hIngester配置:
yaml
ingester:
chunk_encoding: snappy
chunk_idle_period: 30m
max_chunk_age: 2h
chunk_target_size: 1572864 # 1.5MB
lifecycler:
ring:
replication_factor: 3 # 生产环境设为3Querier配置:
yaml
querier:
max_concurrent: 4
query_timeout: 1mCompactor配置:
yaml
compactor:
working_directory: /loki/compactor
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2hStage 5: Limits Configuration
阶段5:配额配置
yaml
limits_config:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_streams_per_user: 10000
max_entries_limit_per_query: 5000
max_query_length: 721h
retention_period: 30d
allow_structured_metadata: true
volume_enabled: trueyaml
limits_config:
ingestion_rate_mb: 10
ingestion_burst_size_mb: 20
max_streams_per_user: 10000
max_entries_limit_per_query: 5000
max_query_length: 721h
retention_period: 30d
allow_structured_metadata: true
volume_enabled: trueStage 6: Server & Auth
阶段6:服务端与认证配置
yaml
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
auth_enabled: true # false for single-tenantyaml
server:
http_listen_port: 3100
grpc_listen_port: 9096
log_level: info
auth_enabled: true # 单租户场景设为falseStage 7: OTLP Ingestion (Loki 3.0+)
阶段7:OTLP接入配置(Loki 3.0+)
Native OpenTelemetry ingestion - use exporter (NOT deprecated ):
otlphttplokiexporteryaml
limits_config:
allow_structured_metadata: true
otlp_config:
resource_attributes:
attributes_config:
- action: index_label # Low-cardinality only!
attributes: [service.name, service.namespace, deployment.environment]
- action: structured_metadata # High-cardinality
attributes: [k8s.pod.name, service.instance.id]Actions: (searchable, low-cardinality), (queryable),
index_labelstructured_metadatadrop⚠️ NEVER useas index_label - use structured_metadata instead.k8s.pod.name
OTel Collector:
yaml
exporters:
otlphttp:
endpoint: http://loki:3100/otlp原生OpenTelemetry数据接入——使用 exporter(不要用已弃用的):
otlphttplokiexporteryaml
limits_config:
allow_structured_metadata: true
otlp_config:
resource_attributes:
attributes_config:
- action: index_label # 仅低基数属性可设为index_label!
attributes: [service.name, service.namespace, deployment.environment]
- action: structured_metadata # 高基数属性设为structured_metadata
attributes: [k8s.pod.name, service.instance.id]可选动作: (可搜索,低基数)、(可查询)、(丢弃)
index_labelstructured_metadatadrop⚠️ 绝对不要将设为index_label,应该归类为structured_metadata。k8s.pod.name
OTel Collector配置:
yaml
exporters:
otlphttp:
endpoint: http://loki:3100/otlpStage 8: Caching
阶段8:缓存配置
yaml
chunk_store_config:
chunk_cache_config:
memcached_client:
host: memcached-chunks
timeout: 500ms
query_range:
cache_results: true
results_cache:
cache:
memcached_client:
host: memcached-resultsyaml
chunk_store_config:
chunk_cache_config:
memcached_client:
host: memcached-chunks
timeout: 500ms
query_range:
cache_results: true
results_cache:
cache:
memcached_client:
host: memcached-resultsStage 9: Advanced Features
阶段9:高级特性配置
Pattern Ingester (3.0+):
yaml
pattern_ingester:
enabled: trueBloom Filters (Experimental, 3.3+): Only for >75TB/month deployments. Works on structured metadata only. See examples/ for config.
Time Sharding (3.4+): For out-of-order ingestion:
yaml
limits_config:
shard_streams:
time_sharding_enabled: trueThanos Storage (3.4+): New storage client, opt-in now, default later:
yaml
storage_config:
use_thanos_objstore: true
object_store:
s3:
bucket_name: my-bucket
endpoint: s3.us-west-2.amazonaws.com模式Ingester(3.0+):
yaml
pattern_ingester:
enabled: true布隆过滤器(实验特性,3.3+): 仅适用于月写入量>75TB的部署,仅支持structured_metadata属性,配置示例可参考examples/目录。
时间分片(3.4+): 支持乱序数据写入:
yaml
limits_config:
shard_streams:
time_sharding_enabled: trueThanos存储(3.4+): 新版存储客户端,当前为可选配置,后续会成为默认:
yaml
storage_config:
use_thanos_objstore: true
object_store:
s3:
bucket_name: my-bucket
endpoint: s3.us-west-2.amazonaws.comStage 10: Ruler (Alerting)
阶段10:Ruler(告警)配置
yaml
ruler:
storage:
type: s3
s3: { bucket_name: loki-ruler }
alertmanager_url: http://alertmanager:9093
enable_api: true
enable_sharding: trueyaml
ruler:
storage:
type: s3
s3: { bucket_name: loki-ruler }
alertmanager_url: http://alertmanager:9093
enable_api: true
enable_sharding: trueStage 11: Loki 3.6 Features
阶段11:Loki 3.6新特性配置
- Horizontally Scalable Compactor:
horizontal_scaling_mode: main|worker - Policy-Based Enforced Labels:
enforced_labels: [service.name] - FluentBit v4: parameter support
structured_metadata
- 水平可扩展Compactor:
horizontal_scaling_mode: main|worker - 基于策略的强制标签:
enforced_labels: [service.name] - FluentBit v4支持: 支持参数
structured_metadata
Stage 12: Validate Configuration (REQUIRED)
阶段12:配置校验(必须执行)
Always validate before deployment:
bash
undefined部署前必须进行校验:
bash
undefinedSyntax and parameter validation
语法和参数校验
loki -config.file=loki-config.yaml -verify-config
loki -config.file=loki-config.yaml -verify-config
Print resolved configuration (shows defaults)
打印完整解析后的配置(包含默认值)
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100
loki -config.file=loki-config.yaml -print-config-stderr 2>&1 | head -100
Dry-run with Docker (if Loki not installed locally)
本地未安装Loki时用Docker执行 dry-run
docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config
**Validation Checklist:**
- [ ] No syntax errors from `-verify-config`
- [ ] Schema uses `tsdb` and `v13`
- [ ] `replication_factor: 3` for production
- [ ] `auth_enabled: true` if multi-tenant
- [ ] Storage credentials/IAM configured
- [ ] Retention period matches requirements
---docker run --rm -v $(pwd)/loki-config.yaml:/etc/loki/config.yaml
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config
grafana/loki:3.6.2 -config.file=/etc/loki/config.yaml -verify-config
**校验清单:**
- [ ] `-verify-config` 无语法错误
- [ ] Schema使用`tsdb`和`v13`
- [ ] 生产环境`replication_factor: 3`
- [ ] 多租户场景`auth_enabled: true`
- [ ] 存储凭证/IAM已正确配置
- [ ] 数据保留周期符合需求
---Production Checklist
生产环境检查清单
High Availability Requirements
高可用要求
Zone-Aware Replication (CRITICAL for production multi-AZ deployments):
When using , ALWAYS enable zone-awareness for multi-AZ deployments:
replication_factor: 3yaml
ingester:
lifecycler:
ring:
replication_factor: 3
zone_awareness_enabled: true # CRITICAL for multi-AZ可用区感知复制(多可用区生产部署的关键配置):
当设置时,多可用区部署必须开启可用区感知:
replication_factor: 3yaml
ingester:
lifecycler:
ring:
replication_factor: 3
zone_awareness_enabled: true # 多可用区部署必须开启Set zone via environment variable or config
通过环境变量或配置设置可用区
Each pod should set its zone based on node topology
每个Pod应根据节点拓扑设置自身可用区
common:
instance_availability_zone: ${AVAILABILITY_ZONE}
**Why:** Without zone-awareness, all 3 replicas may land in the same AZ. If that AZ fails, you lose data.
**Kubernetes Implementation:**
```yamlcommon:
instance_availability_zone: ${AVAILABILITY_ZONE}
**原因:** 不开启可用区感知的话,3个副本可能都调度到同一个可用区,如果该可用区故障会导致数据丢失。
**Kubernetes实现方式:**
```yamlIn Helm values or pod spec
在Helm values或Pod spec中配置
env:
- name: AVAILABILITY_ZONE valueFrom: fieldRef: fieldPath: metadata.labels['topology.kubernetes.io/zone']
undefinedenv:
- name: AVAILABILITY_ZONE valueFrom: fieldRef: fieldPath: metadata.labels['topology.kubernetes.io/zone']
undefinedTLS Configuration (Production Required)
TLS配置(生产环境必须开启)
Enable TLS for all inter-component and client communication:
yaml
server:
http_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crt # For mTLS
grpc_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crtSee for complete TLS configuration.
examples/production-tls.yaml所有组件间和客户端通信都要开启TLS:
yaml
server:
http_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crt # 双向TLS认证使用
grpc_tls_config:
cert_file: /etc/loki/tls/tls.crt
key_file: /etc/loki/tls/tls.key
client_ca_file: /etc/loki/tls/ca.crt完整TLS配置可参考文件。
examples/production-tls.yamlProduction Checklist Summary
生产检查清单汇总
| Requirement | Setting | Required For |
|---|---|---|
| common block | All production |
| ingester.lifecycler.ring | Multi-AZ |
| root level | Multi-tenant |
| TLS enabled | server block | All production |
| IAM roles (not keys) | storage config | Cloud storage |
| Caching enabled | chunk_store_config, query_range | Performance |
| Pattern ingester | pattern_ingester.enabled | Observability |
| Retention configured | compactor + limits_config | Cost control |
| 要求 | 配置位置 | 适用场景 |
|---|---|---|
| common块 | 所有生产环境 |
| ingester.lifecycler.ring | 多可用区部署 |
| 根层级 | 多租户场景 |
| 开启TLS | server块 | 所有生产环境 |
| 使用IAM角色(而非密钥) | 存储配置 | 云存储场景 |
| 开启缓存 | chunk_store_config、query_range | 性能优化 |
| 开启模式Ingester | pattern_ingester.enabled | 可观测性提升 |
| 配置数据保留周期 | compactor + limits_config | 成本控制 |
Monitoring Recommendations
监控建议
Key Metrics to Monitor
核心监控指标
Configure Prometheus to scrape Loki metrics and alert on these critical indicators:
yaml
undefined配置Prometheus采集Loki指标,并对以下关键指标设置告警:
yaml
undefinedPrometheus scrape config
Prometheus采集配置
- job_name: 'loki'
static_configs:
- targets: ['loki:3100']
undefined- job_name: 'loki'
static_configs:
- targets: ['loki:3100']
undefinedCritical Alerts
关键告警规则
yaml
groups:
- name: loki-critical
rules:
# Ingestion failures
- alert: LokiIngestionFailures
expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Loki ingestion failures detected"
# High stream cardinality (performance killer)
- alert: LokiHighStreamCardinality
expr: loki_ingester_memory_streams > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "High stream cardinality - review labels"
# Compaction not running (retention broken)
- alert: LokiCompactionStalled
expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
for: 5m
labels:
severity: critical
annotations:
summary: "Loki compaction stalled - retention not enforced"
# Query latency
- alert: LokiSlowQueries
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "Loki query P99 latency > 30s"
# Ingester memory pressure
- alert: LokiIngesterMemoryHigh
expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "Loki ingester memory usage > 80%"yaml
groups:
- name: loki-critical
rules:
# 数据写入失败
- alert: LokiIngestionFailures
expr: sum(rate(loki_distributor_ingester_append_failures_total[5m])) > 0
for: 5m
labels:
severity: critical
annotations:
summary: "检测到Loki写入失败"
# 流基数过高(严重影响性能)
- alert: LokiHighStreamCardinality
expr: loki_ingester_memory_streams > 100000
for: 10m
labels:
severity: warning
annotations:
summary: "流基数过高,请检查标签配置"
# 压缩任务停止(数据保留失效)
- alert: LokiCompactionStalled
expr: time() - loki_compactor_last_successful_run_timestamp_seconds > 7200
for: 5m
labels:
severity: critical
annotations:
summary: "Loki压缩任务停滞,数据保留策略未生效"
# 查询延迟过高
- alert: LokiSlowQueries
expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{route=~"loki_api_v1_query.*"}[5m])) by (le)) > 30
for: 10m
labels:
severity: warning
annotations:
summary: "Loki查询P99延迟超过30秒"
# Ingester内存压力过高
- alert: LokiIngesterMemoryHigh
expr: container_memory_usage_bytes{container="ingester"} / container_spec_memory_limit_bytes{container="ingester"} > 0.8
for: 10m
labels:
severity: warning
annotations:
summary: "Loki ingester内存使用率超过80%"Key Metrics Reference
核心指标参考
| Metric | Description | Action Threshold |
|---|---|---|
| Active streams in memory | >100k: review cardinality |
| Ingestion failures | >0: investigate immediately |
| Query latency | P99 >30s: add caching/queriers |
| Chunk flush rate | Low rate: check ingester health |
| Last compaction | >2h ago: compaction broken |
| 指标 | 描述 | 告警阈值 |
|---|---|---|
| 内存中活跃流数量 | >10万:检查基数 |
| 写入失败次数 | >0:立即排查 |
| 查询延迟 | P99>30秒:增加缓存/Querier实例 |
| Chunk刷盘速率 | 速率过低:检查Ingester健康状态 |
| 上次成功压缩时间 | 超过2小时:压缩任务故障 |
Grafana Dashboard
Grafana仪表盘
Import official Loki dashboards:
- Dashboard ID: - Loki Logs
13407 - Dashboard ID: - Loki Operational
14055
导入官方Loki仪表盘:
- 仪表盘ID: - Loki日志
13407 - 仪表盘ID: - Loki运营监控
14055
Log Collection with Grafana Alloy
使用Grafana Alloy采集日志
Promtail is deprecated (support ends Feb 2026). Use Grafana Alloy for new deployments.
Promtail已弃用(支持到2026年2月),新部署请使用Grafana Alloy。
Basic Alloy Configuration
基础Alloy配置
See for complete configuration.
examples/grafana-alloy.yamlalloy
// Kubernetes log discovery
discovery.kubernetes "pods" {
role = "pod"
}
// Relabeling for Kubernetes metadata
discovery.relabel "pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
}
// Log collection
loki.source.kubernetes "pods" {
targets = discovery.relabel.pods.output
forward_to = [loki.write.default.receiver]
}
// Send to Loki
loki.write "default" {
endpoint {
url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"
// For multi-tenant
tenant_id = "default"
}
}完整配置可参考文件。
examples/grafana-alloy.yamlalloy
// Kubernetes日志发现
discovery.kubernetes "pods" {
role = "pod"
}
// Kubernetes元数据重标记
discovery.relabel "pods" {
targets = discovery.kubernetes.pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_name"]
target_label = "container"
}
}
// 日志采集
loki.source.kubernetes "pods" {
targets = discovery.relabel.pods.output
forward_to = [loki.write.default.receiver]
}
// 发送到Loki
loki.write "default" {
endpoint {
url = "http://loki-gateway.loki.svc.cluster.local/loki/api/v1/push"
// 多租户场景配置
tenant_id = "default"
}
}Migration from Promtail
从Promtail迁移
bash
undefinedbash
undefinedConvert Promtail config to Alloy
将Promtail配置转换为Alloy配置
alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml
---alloy convert --source-format=promtail --output=alloy-config.alloy promtail.yaml
---Complete Examples
完整示例
See directory for full configurations:
examples/- - Development/testing
monolithic-filesystem.yaml - - Production with S3
simple-scalable-s3.yaml - - Large-scale distributed
microservices-s3.yaml - - Multi-tenant with per-tenant limits
multi-tenant.yaml - - TLS-enabled production config
production-tls.yaml - - Log collection with Alloy
grafana-alloy.yaml - - Helm chart values
kubernetes-helm-values.yaml
Minimal Monolithic:
yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2025-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
compactor:
working_directory: /loki/compactor
retention_enabled: trueexamples/- - 开发/测试环境单体配置
monolithic-filesystem.yaml - - 适配S3的生产环境配置
simple-scalable-s3.yaml - - 大规模分布式部署配置
microservices-s3.yaml - - 支持单租户配额的多租户配置
multi-tenant.yaml - - 开启TLS的生产配置
production-tls.yaml - - Alloy日志采集配置
grafana-alloy.yaml - - Helm chart部署values
kubernetes-helm-values.yaml
最简单体配置:
yaml
auth_enabled: false
server:
http_listen_port: 3100
common:
path_prefix: /loki
storage:
filesystem:
chunks_directory: /loki/chunks
rules_directory: /loki/rules
replication_factor: 1
ring:
kvstore:
store: inmemory
schema_config:
configs:
- from: 2025-01-01
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
compactor:
working_directory: /loki/compactor
retention_enabled: trueHelm Deployment
Helm部署
bash
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yamlGenerate both native config and Helm values for Kubernetes deployments.
yaml
undefinedbash
helm repo add grafana https://grafana.github.io/helm-charts
helm install loki grafana/loki -f values.yamlKubernetes部署请同时生成原生配置和Helm values。
yaml
undefinedvalues.yaml
values.yaml
deploymentMode: SimpleScalable
loki:
schemaConfig:
configs:
- from: "2025-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
Zone awareness for HA
ingester:
lifecycler:
ring:
zone_awareness_enabled: true
backend:
replicas: 3
Spread across zones
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
read:
replicas: 3
write:
replicas: 3
---deploymentMode: SimpleScalable
loki:
schemaConfig:
configs:
- from: "2025-01-01"
store: tsdb
object_store: s3
schema: v13
index:
prefix: loki_index_
period: 24h
limits_config:
retention_period: 30d
allow_structured_metadata: true
高可用可用区感知配置
ingester:
lifecycler:
ring:
zone_awareness_enabled: true
backend:
replicas: 3
跨可用区调度
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: DoNotSchedule
read:
replicas: 3
write:
replicas: 3
---Best Practices
最佳实践
Performance:
- ,
chunk_encoding: snappychunk_target_size: 1572864 - Enable caching (chunks, results)
parallelise_shardable_queries: true
Security:
- with reverse proxy auth
auth_enabled: true - IAM roles for cloud storage (never hardcode keys)
- TLS for all communications (see Production Checklist)
Reliability:
- for production
replication_factor: 3 - for multi-AZ (see Production Checklist)
zone_awareness_enabled: true - Persistent volumes for ingesters
- Monitor ingestion rate and query latency (see Monitoring section)
Limits: Set , to prevent overload
ingestion_rate_mbmax_streams_per_user性能优化:
- 配置、
chunk_encoding: snappychunk_target_size: 1572864 - 开启缓存(chunk、查询结果)
- 配置
parallelise_shardable_queries: true
安全建议:
- 开启配合反向代理认证
auth_enabled: true - 云存储使用IAM角色(绝对不要硬编码密钥)
- 所有通信开启TLS(参考生产检查清单)
可靠性建议:
- 生产环境配置
replication_factor: 3 - 多可用区部署开启(参考生产检查清单)
zone_awareness_enabled: true - Ingester使用持久化卷
- 监控写入速率和查询延迟(参考监控章节)
配额配置: 合理设置、防止系统过载
ingestion_rate_mbmax_streams_per_userCommon Issues
常见问题
| Issue | Solution |
|---|---|
| High ingester memory | Reduce |
| Slow queries | Increase |
| Ingestion failures | Check |
| Storage growing fast | Enable retention, check compression, review cardinality |
| Data loss in AZ failure | Enable |
| Config validation fails | Run |
| 问题 | 解决方案 |
|---|---|
| Ingester内存占用过高 | 降低 |
| 查询缓慢 | 提高 |
| 写入失败 | 检查 |
| 存储容量增长过快 | 开启数据保留,检查压缩配置,优化基数 |
| 可用区故障导致数据丢失 | 开启 |
| 配置校验失败 | 执行 |
Deprecated (Migrate Away)
已弃用特性(请迁移)
- →
boltdb-shippertsdb - →
lokiexporterotlphttp - Promtail → Grafana Alloy (support ends Feb 2026)
- → 替换为
boltdb-shippertsdb - → 替换为
lokiexporterotlphttp - Promtail → 替换为Grafana Alloy(2026年2月停止支持)
Resources
资源
scripts/generate_config.py - Generate configs programmatically (RECOMMENDED)
examples/ - Complete configuration examples for all modes
references/ - Full parameter reference and best practices
scripts/generate_config.py - 程序化生成配置(推荐)
examples/ - 全场景完整配置示例
references/ - 完整参数参考和最佳实践
Related Skills
关联技能
- logql-generator - LogQL query generation
- fluentbit-generator - Log collection to Loki
- logql-generator - LogQL查询生成
- fluentbit-generator - 采集日志到Loki