fluentbit-validator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Fluent Bit Config Validator

Fluent Bit 配置验证工具

Overview

概述

This skill provides a comprehensive validation workflow for Fluent Bit configurations, combining syntax validation, semantic checks, security auditing, best practice enforcement, and dry-run testing. Validate Fluent Bit configs with confidence before deploying to production.
Fluent Bit uses an INI-like configuration format with sections ([SERVICE], [INPUT], [FILTER], [OUTPUT], [PARSER]) and key-value pairs. This validator ensures configurations are syntactically correct, semantically valid, secure, and optimized for production use.
本工具为Fluent Bit配置提供了一套全面的验证工作流,结合了语法验证、语义检查、安全审计、最佳实践合规性检查以及试运行测试。在部署到生产环境前,可放心地用它验证Fluent Bit配置。
Fluent Bit使用类INI的配置格式,包含[SERVICE]、[INPUT]、[FILTER]、[OUTPUT]、[PARSER]等区段以及键值对。该验证工具可确保配置在语法上正确、语义上有效、安全且针对生产环境进行了优化。

When to Use This Skill

适用场景

Invoke this skill when:
  • Validating Fluent Bit configurations before deployment
  • Debugging configuration syntax errors
  • Testing configurations with fluent-bit --dry-run
  • Working with custom plugins that need documentation
  • Ensuring configs follow Fluent Bit best practices
  • Auditing configurations for security issues
  • Optimizing performance settings (buffers, flush intervals)
  • The user asks to "validate", "lint", "check", or "test" Fluent Bit configs
  • Troubleshooting configuration-related errors
在以下场景中调用本工具:
  • 部署前验证Fluent Bit配置
  • 调试配置语法错误
  • 使用fluent-bit --dry-run测试配置
  • 处理需要文档的自定义插件
  • 确保配置遵循Fluent Bit最佳实践
  • 审计配置中的安全问题
  • 优化性能设置(缓冲区、刷新间隔)
  • 用户要求“验证”“检查”或“测试”Fluent Bit配置时
  • 排查与配置相关的错误

Validation Workflow

验证工作流

Follow this sequential validation workflow. Each stage catches different types of issues.
Recommended: For comprehensive validation, use
--check all
which runs all validation stages in sequence:
bash
python3 scripts/validate_config.py --file <config-file> --check all
Individual check modes are available for targeted validation when debugging specific issues.
遵循以下顺序验证工作流,每个阶段可发现不同类型的问题。
推荐: 如需全面验证,使用
--check all
参数,它会按顺序运行所有验证阶段:
bash
python3 scripts/validate_config.py --file <config-file> --check all
当调试特定问题时,可使用单独的检查模式进行针对性验证。

Stage 1: Configuration File Structure

阶段1:配置文件结构

Verify the basic file structure and format:
bash
python3 scripts/validate_config.py --file <config-file> --check structure
Expected format:
  • INI-style sections with
    [SECTION]
    headers
  • Key-value pairs with proper spacing
  • Comments starting with
    #
  • Sections: SERVICE, INPUT, FILTER, OUTPUT, PARSER (or MULTILINE_PARSER)
  • Proper indentation (spaces, not tabs recommended)
Common issues caught:
  • Missing section headers
  • Malformed key-value pairs
  • Invalid section names
  • Syntax errors (unclosed brackets, etc.)
  • Mixed tabs and spaces
  • UTF-8 encoding issues
验证基本文件结构和格式:
bash
python3 scripts/validate_config.py --file <config-file> --check structure
预期格式:
  • 带有
    [SECTION]
    头的INI式区段
  • 格式规范的键值对
  • #
    开头的注释
  • 区段类型:SERVICE、INPUT、FILTER、OUTPUT、PARSER(或MULTILINE_PARSER)
  • 规范的缩进(推荐使用空格而非制表符)
可捕获的常见问题:
  • 缺失区段头
  • 格式错误的键值对
  • 无效的区段名称
  • 语法错误(如未闭合的括号等)
  • 混合使用制表符和空格
  • UTF-8编码问题

Stage 2: Section Validation

阶段2:区段验证

Validate all configuration sections (SERVICE, INPUT, FILTER, OUTPUT, PARSER):
bash
python3 scripts/validate_config.py --file <config-file> --check sections
This single command validates all section types. The checks performed for each section type are detailed below.
验证所有配置区段(SERVICE、INPUT、FILTER、OUTPUT、PARSER):
bash
python3 scripts/validate_config.py --file <config-file> --check sections
该命令可验证所有类型的区段。以下是针对每种区段类型执行的检查详情。

SERVICE Section Checks

SERVICE区段检查

Checks:
  • Required parameters: Flush
  • Valid parameter names (no typos)
  • Parameter value types (Flush must be numeric)
  • Log_Level values: off, error, warn, info, debug, trace
  • HTTP_Server values: On/Off
  • Parsers_File references (file existence)
Common issues:
  • Missing Flush parameter
  • Invalid Log_Level value
  • Parsers_File path doesn't exist
  • Negative or zero Flush interval
Best practices:
  • Flush: 1-5 seconds (balance latency vs. efficiency)
  • Log_Level: info for production, debug for troubleshooting
  • HTTP_Server: On (for health checks and metrics)
  • storage.metrics: on (for monitoring)
检查项:
  • 必填参数:Flush
  • 有效的参数名称(无拼写错误)
  • 参数值类型(Flush必须为数值型)
  • Log_Level的有效值:off、error、warn、info、debug、trace
  • HTTP_Server的有效值:On/Off
  • Parsers_File引用的文件是否存在
常见问题:
  • 缺失Flush参数
  • Log_Level值无效
  • Parsers_File路径不存在
  • Flush间隔为负数或0
最佳实践:
  • Flush:1-5秒(平衡延迟与效率)
  • Log_Level:生产环境用info,排查问题时用debug
  • HTTP_Server:设为On(用于健康检查和指标采集)
  • storage.metrics:设为on(用于监控)

INPUT Section Checks

INPUT区段检查

Checks:
  • Required parameters: Name
  • Valid plugin names (tail, systemd, tcp, forward, http, etc.)
  • Tag format (no spaces, valid characters)
  • File paths exist (for tail plugin)
  • Memory limits are set (Mem_Buf_Limit)
  • DB file paths are valid
  • Port numbers are in valid range (1-65535)
Common issues:
  • Missing Name parameter
  • Invalid plugin name (typo)
  • Missing Tag parameter
  • Path doesn't exist
  • Missing Mem_Buf_Limit (OOM risk)
  • Missing DB file (no position tracking)
  • Port conflicts
Best practices:
  • Always set Mem_Buf_Limit (50-100MB typical)
  • Use DB for tail inputs (crash recovery)
  • Set Skip_Long_Lines On (prevents hang)
  • Use appropriate Tag patterns for routing
  • Set Refresh_Interval for tail (10 seconds typical)
检查项:
  • 必填参数:Name
  • 有效的插件名称(tail、systemd、tcp、forward、http等)
  • Tag格式(无空格,使用有效字符)
  • 文件路径是否存在(针对tail插件)
  • 是否设置了内存限制(Mem_Buf_Limit)
  • DB文件路径是否有效
  • 端口号在有效范围(1-65535)内
常见问题:
  • 缺失Name参数
  • 插件名称无效(拼写错误)
  • 缺失Tag参数
  • 路径不存在
  • 缺失Mem_Buf_Limit(存在内存耗尽风险)
  • 缺失DB文件(无位置跟踪功能)
  • 端口冲突
最佳实践:
  • 始终设置Mem_Buf_Limit(典型值为50-100MB)
  • 对tail输入使用DB(用于崩溃恢复)
  • 设置Skip_Long_Lines为On(防止进程挂起)
  • 使用合适的Tag模式进行路由
  • 为tail设置Refresh_Interval(典型值为10秒)

FILTER Section Checks

FILTER区段检查

Checks:
  • Required parameters: Name, Match (or Match_Regex)
  • Valid filter plugin names
  • Match pattern syntax
  • Tag pattern wildcards are valid
  • Filter-specific parameters
Common issues:
  • Missing Match parameter
  • Invalid filter plugin name
  • Match pattern doesn't match any INPUT tags
  • Missing required plugin-specific parameters
Best practices:
  • Use specific Match patterns (avoid "*" unless intended)
  • Order filters logically (parsers before modifiers)
  • Use kubernetes filter in K8s environments
  • Parse JSON logs early in pipeline
检查项:
  • 必填参数:Name、Match(或Match_Regex)
  • 有效的过滤器插件名称
  • Match模式语法
  • Tag模式通配符是否有效
  • 过滤器专属参数
常见问题:
  • 缺失Match参数
  • 过滤器插件名称无效
  • Match模式与任何INPUT的Tag不匹配
  • 缺失插件专属的必填参数
最佳实践:
  • 使用具体的Match模式(除非有意,否则避免使用"*")
  • 按逻辑顺序排列过滤器(解析器在前,修改器在后)
  • 在K8s环境中使用kubernetes过滤器
  • 在流水线早期解析JSON日志

OUTPUT Section Checks

OUTPUT区段检查

Checks:
  • Required parameters: Name, Match
  • Valid output plugin names (including elasticsearch, kafka, loki, s3, cloudwatch, http, forward, file, opentelemetry)
  • Host/Port validity
  • Retry_Limit is set
  • Storage limits are configured
  • TLS configuration (if enabled)
  • OpenTelemetry-specific: URI endpoints (metrics_uri, logs_uri, traces_uri), authentication headers, resource attributes
Common issues:
  • Missing Match parameter
  • Invalid output plugin name
  • Match pattern doesn't match any INPUT tags
  • Missing Retry_Limit (infinite retries risk)
  • Missing storage.total_limit_size (disk exhaustion risk)
  • Hardcoded credentials (security issue)
Best practices:
  • Set Retry_Limit 3-5
  • Configure storage.total_limit_size
  • Enable TLS in production
  • Use environment variables for credentials
  • Enable compression when available
检查项:
  • 必填参数:Name、Match
  • 有效的输出插件名称(包括elasticsearch、kafka、loki、s3、cloudwatch、http、forward、file、opentelemetry)
  • Host/Port的有效性
  • 是否设置了Retry_Limit
  • 是否配置了存储限制
  • TLS配置(若启用)
  • OpenTelemetry专属项:URI端点(metrics_uri、logs_uri、traces_uri)、认证头、资源属性
常见问题:
  • 缺失Match参数
  • 输出插件名称无效
  • Match模式与任何INPUT的Tag不匹配
  • 缺失Retry_Limit(存在无限重试风险)
  • 缺失storage.total_limit_size(存在磁盘耗尽风险)
  • 硬编码凭证(安全问题)
最佳实践:
  • 设置Retry_Limit为3-5
  • 配置storage.total_limit_size
  • 生产环境中启用TLS
  • 使用环境变量存储凭证
  • 可用时启用压缩

PARSER Section Checks

PARSER区段检查

Checks:
  • Required parameters: Name, Format
  • Valid parser formats: json, regex, logfmt, ltsv
  • Regex syntax validity
  • Time_Format compatibility with Time_Key
  • MULTILINE_PARSER rule syntax
Common issues:
  • Invalid regex patterns
  • Time_Format doesn't match log timestamps
  • Missing Time_Key when using Time_Format
  • MULTILINE_PARSER rules don't match
Best practices:
  • Test regex patterns with sample logs
  • Use built-in parsers when possible
  • Set proper Time_Format for timestamp parsing
  • Use MULTILINE_PARSER for stack traces
检查项:
  • 必填参数:Name、Format
  • 有效的解析器格式:json、regex、logfmt、ltsv
  • Regex语法有效性
  • Time_Format与Time_Key的兼容性
  • MULTILINE_PARSER规则语法
常见问题:
  • Regex模式无效
  • Time_Format与日志时间戳不匹配
  • 使用Time_Format时缺失Time_Key
  • MULTILINE_PARSER规则不匹配
最佳实践:
  • 用示例日志测试Regex模式
  • 尽可能使用内置解析器
  • 为时间戳解析设置合适的Time_Format
  • 对堆栈跟踪使用MULTILINE_PARSER

Stage 3: Tag Consistency Check

阶段3:Tag一致性检查

Validate that tags flow correctly through the pipeline:
bash
python3 scripts/validate_config.py --file <config-file> --check tags
Checks:
  • INPUT tags match FILTER Match patterns
  • FILTER tags match OUTPUT Match patterns
  • No orphaned filters (Match pattern doesn't match any INPUT)
  • No orphaned outputs (Match pattern doesn't match any INPUT/FILTER)
  • Tag wildcards are used correctly
Common issues:
  • FILTER Match pattern doesn't match any INPUT Tag
  • OUTPUT Match pattern doesn't match any logs
  • Typo in Match pattern
  • Incorrect wildcard usage
Example validation:
ini
[INPUT]
    Tag    kube.*     # Produces: kube.var.log.containers.pod.log

[FILTER]
    Match  kube.*     # Matches: ✅

[OUTPUT]
    Match  app.*      # Matches: ❌ No logs will reach this output
验证Tag在整个流水线中的流转是否正确:
bash
python3 scripts/validate_config.py --file <config-file> --check tags
检查项:
  • INPUT的Tag与FILTER的Match模式匹配
  • FILTER的Tag与OUTPUT的Match模式匹配
  • 无孤立过滤器(Match模式与任何INPUT都不匹配)
  • 无孤立输出(Match模式与任何INPUT/FILTER都不匹配)
  • Tag通配符使用正确
常见问题:
  • FILTER的Match模式与任何INPUT的Tag都不匹配
  • OUTPUT的Match模式与任何日志都不匹配
  • Match模式存在拼写错误
  • 通配符使用错误
验证示例:
ini
[INPUT]
    Tag    kube.*     # 生成:kube.var.log.containers.pod.log

[FILTER]
    Match  kube.*     # 匹配:✅

[OUTPUT]
    Match  app.*      # 匹配:❌ 无日志会到达此输出

Stage 4: Security Audit

阶段4:安全审计

Scan configuration for security issues:
bash
python3 scripts/validate_config.py --file <config-file> --check security
Checks performed:
  1. Hardcoded credentials:
    • HTTP_User, HTTP_Passwd in OUTPUT
    • AWS_Access_Key, AWS_Secret_Key
    • Passwords in plain text
    • API keys and tokens
  2. TLS configuration:
    • TLS disabled for production outputs
    • tls.verify Off (man-in-the-middle risk)
    • Missing certificate files
  3. File permissions:
    • DB files readable/writable
    • Parser files exist and readable
    • Log files have appropriate permissions
  4. Network exposure:
    • INPUT plugins listening on 0.0.0.0 without auth
    • Open ports without firewall mentions
    • HTTP_Server exposed without auth
Security best practices:
  • Use environment variables:
    HTTP_User ${ES_USER}
  • Enable TLS:
    tls On
  • Verify certificates:
    tls.verify On
  • Don't listen on 0.0.0.0 for sensitive inputs
  • Use authentication for HTTP endpoints
Auto-fix suggestions:
ini
undefined
扫描配置中的安全问题:
bash
python3 scripts/validate_config.py --file <config-file> --check security
执行的检查项:
  1. 硬编码凭证:
    • OUTPUT中的HTTP_User、HTTP_Passwd
    • AWS_Access_Key、AWS_Secret_Key
    • 明文密码
    • API密钥和令牌
  2. TLS配置:
    • 生产环境输出禁用TLS
    • tls.verify设为Off(存在中间人攻击风险)
    • 缺失证书文件
  3. 文件权限:
    • DB文件可读写
    • 解析器文件存在且可读
    • 日志文件权限合适
  4. 网络暴露:
    • INPUT插件在无认证的情况下监听0.0.0.0
    • 开放端口未提及防火墙设置
    • HTTP_Server在无认证的情况下暴露
安全最佳实践:
  • 使用环境变量:
    HTTP_User ${ES_USER}
  • 启用TLS:
    tls On
  • 验证证书:
    tls.verify On
  • 敏感输入不要监听0.0.0.0
  • 为HTTP端点启用认证
自动修复建议:
ini
undefined

Before (insecure)

修复前(不安全)

[OUTPUT] HTTP_User admin HTTP_Passwd password123
[OUTPUT] HTTP_User admin HTTP_Passwd password123

After (secure)

修复后(安全)

[OUTPUT] HTTP_User ${ES_USER} HTTP_Passwd ${ES_PASSWORD}
undefined
[OUTPUT] HTTP_User ${ES_USER} HTTP_Passwd ${ES_PASSWORD}
undefined

Stage 5: Performance Analysis

阶段5:性能分析

Analyze configuration for performance issues:
bash
python3 scripts/validate_config.py --file <config-file> --check performance
Checks:
  1. Buffer limits:
    • Mem_Buf_Limit is set on all tail inputs
    • storage.total_limit_size is set on outputs
    • Limits are reasonable (not too small or too large)
  2. Flush intervals:
    • Flush interval is appropriate (1-5 sec typical)
    • Not too low (high CPU) or too high (high memory)
  3. Resource usage:
    • Skip_Long_Lines enabled (prevents hang)
    • Refresh_Interval set (file discovery)
    • Compression enabled on network outputs
  4. Kubernetes-specific:
    • Buffer_Size 0 for kubernetes filter (recommended)
    • Mem_Buf_Limit not too low for container logs
Performance recommendations:
ini
undefined
分析配置中的性能问题:
bash
python3 scripts/validate_config.py --file <config-file> --check performance
检查项:
  1. 缓冲区限制:
    • 所有tail输入都设置了Mem_Buf_Limit
    • 输出配置了storage.total_limit_size
    • 限制值合理(不过小或过大)
  2. 刷新间隔:
    • Flush间隔合适(典型值为1-5秒)
    • 不要过低(CPU占用高)或过高(内存占用高)
  3. 资源使用:
    • 启用Skip_Long_Lines(防止进程挂起)
    • 设置了Refresh_Interval(用于文件发现)
    • 网络输出启用了压缩
  4. Kubernetes专属项:
    • kubernetes过滤器的Buffer_Size设为0(推荐值)
    • 容器日志的Mem_Buf_Limit不要过低
性能优化建议:
ini
undefined

Good configuration

优化后的配置

[SERVICE] Flush 1 # 1 second: good balance
[INPUT] Mem_Buf_Limit 50MB # Prevents OOM Skip_Long_Lines On # Prevents hang Refresh_Interval 10 # File discovery every 10s
[OUTPUT] storage.total_limit_size 5G # Disk buffer limit Retry_Limit 3 # Don't retry forever Compress gzip # Reduce bandwidth
undefined
[SERVICE] Flush 1 # 1秒:平衡效果好
[INPUT] Mem_Buf_Limit 50MB # 防止内存耗尽 Skip_Long_Lines On # 防止进程挂起 Refresh_Interval 10 # 每10秒执行一次文件发现
[OUTPUT] storage.total_limit_size 5G # 磁盘缓冲区限制 Retry_Limit 3 # 不要无限重试 Compress gzip # 减少带宽占用
undefined

Stage 6: Best Practice Validation

阶段6:最佳实践验证

Check against Fluent Bit best practices:
bash
python3 scripts/validate_config.py --file <config-file> --check best-practices
Checks:
  1. Required configurations:
    • SERVICE section exists
    • At least one INPUT
    • At least one OUTPUT
    • HTTP_Server enabled (for health checks)
  2. Kubernetes configurations:
    • kubernetes filter used for K8s logs
    • Proper Kube_URL, Kube_CA_File, Kube_Token_File
    • Exclude_Path to prevent log loops
    • DB file for position tracking
  3. Reliability:
    • Retry_Limit set on outputs
    • DB file for tail inputs
    • storage.type filesystem for critical logs
  4. Observability:
    • HTTP_Server enabled
    • storage.metrics enabled
    • Proper Log_Level (info or debug)
Best practice checklist:
  • ✅ SERVICE section with Flush parameter
  • ✅ HTTP_Server enabled for health checks
  • ✅ Mem_Buf_Limit on all tail inputs
  • ✅ DB file for tail inputs (position tracking)
  • ✅ Retry_Limit on all outputs
  • ✅ storage.total_limit_size on outputs
  • ✅ TLS enabled for production
  • ✅ Environment variables for credentials
  • ✅ kubernetes filter for K8s environments
  • ✅ Exclude_Path to prevent log loops
检查配置是否符合Fluent Bit最佳实践:
bash
python3 scripts/validate_config.py --file <config-file> --check best-practices
检查项:
  1. 必填配置:
    • 存在SERVICE区段
    • 至少有一个INPUT
    • 至少有一个OUTPUT
    • 启用了HTTP_Server(用于健康检查)
  2. Kubernetes配置:
    • 针对K8s日志使用kubernetes过滤器
    • 正确配置Kube_URL、Kube_CA_File、Kube_Token_File
    • 使用Exclude_Path防止日志循环
    • 使用DB文件进行位置跟踪
  3. 可靠性:
    • 输出设置了Retry_Limit
    • tail输入使用DB文件
    • 关键日志的storage.type设为filesystem
  4. 可观测性:
    • 启用了HTTP_Server
    • 启用了storage.metrics
    • 设置了合适的Log_Level(info或debug)
最佳实践 checklist:
  • ✅ 带有Flush参数的SERVICE区段
  • ✅ 启用HTTP_Server用于健康检查
  • ✅ 所有tail输入都设置了Mem_Buf_Limit
  • ✅ tail输入使用DB文件(位置跟踪)
  • ✅ 所有输出都设置了Retry_Limit
  • ✅ 输出配置了storage.total_limit_size
  • ✅ 生产环境启用TLS
  • ✅ 使用环境变量存储凭证
  • ✅ K8s环境中使用kubernetes过滤器
  • ✅ 使用Exclude_Path防止日志循环

Stage 7: Dry-Run Testing

阶段7:试运行测试

Test configuration with Fluent Bit dry-run (if binary available):
bash
fluent-bit -c <config-file> --dry-run
This catches:
  • Configuration parsing errors
  • Plugin loading errors
  • Parser syntax errors
  • File permission issues
  • Missing dependencies
Common errors:
  1. Parser file not found:
[error] [config] parser file 'parsers.conf' not found
Fix: Create parser file or update Parsers_File path
  1. Plugin not found:
[error] [plugins] invalid plugin 'unknownplugin'
Fix: Check plugin name spelling or install plugin
  1. Invalid parameter:
[error] [input:tail] invalid property 'InvalidParam'
Fix: Remove invalid parameter or check documentation
  1. Permission denied:
[error] cannot open /var/log/containers/*.log
Fix: Check file permissions or run with appropriate user
If fluent-bit binary is not available:
  • Skip this stage
  • Document that dry-run testing was skipped
  • Recommend testing in development environment
使用Fluent Bit的试运行功能测试配置(若二进制文件可用):
bash
fluent-bit -c <config-file> --dry-run
可捕获的问题:
  • 配置解析错误
  • 插件加载错误
  • 解析器语法错误
  • 文件权限问题
  • 缺失依赖
常见错误:
  1. 解析器文件未找到:
[error] [config] parser file 'parsers.conf' not found
修复方案:创建解析器文件或更新Parsers_File路径
  1. 插件未找到:
[error] [plugins] invalid plugin 'unknownplugin'
修复方案:检查插件名称拼写或安装插件
  1. 无效参数:
[error] [input:tail] invalid property 'InvalidParam'
修复方案:移除无效参数或查阅文档
  1. 权限拒绝:
[error] cannot open /var/log/containers/*.log
修复方案:检查文件权限或使用合适的用户运行
若fluent-bit二进制文件不可用:
  • 跳过此阶段
  • 记录试运行测试已跳过
  • 建议在开发环境中进行测试

Stage 8: Documentation Lookup (if needed)

阶段8:文档查询(如有需要)

If configuration uses unfamiliar plugins or parameters:
Try context7 MCP first:
Use mcp__context7__resolve-library-id with "fluent-bit"
Then use mcp__context7__get-library-docs with:
- context7CompatibleLibraryID: /fluent/fluent-bit-docs
- topic: "<plugin-type> <plugin-name> configuration"
- page: 1
Fallback to WebSearch:
Search query: "fluent-bit <plugin-type> <plugin-name> configuration parameters site:docs.fluentbit.io"

Examples:
- "fluent-bit output elasticsearch configuration parameters site:docs.fluentbit.io"
- "fluent-bit filter kubernetes configuration parameters site:docs.fluentbit.io"
Extract information:
  • Required parameters
  • Optional parameters and defaults
  • Valid value ranges
  • Example configurations
若配置使用了不熟悉的插件或参数:
优先使用context7 MCP:
使用mcp__context7__resolve-library-id,参数为"fluent-bit"
然后使用mcp__context7__get-library-docs,参数:
- context7CompatibleLibraryID: /fluent/fluent-bit-docs
- topic: "<plugin-type> <plugin-name> configuration"
- page: 1
备用方案:Web搜索
搜索查询:"fluent-bit <plugin-type> <plugin-name> configuration parameters site:docs.fluentbit.io"

示例:
- "fluent-bit output elasticsearch configuration parameters site:docs.fluentbit.io"
- "fluent-bit filter kubernetes configuration parameters site:docs.fluentbit.io"
提取信息:
  • 必填参数
  • 可选参数及默认值
  • 有效值范围
  • 配置示例

Stage 9: Report and Fix Issues

阶段9:问题报告与修复

After validation, present comprehensive findings:
1. Summarize all issues:
Validation Report for fluent-bit.conf
=====================================

Errors (3):
  - [Line 15] OUTPUT elasticsearch missing required parameter 'Host'
  - [Line 25] FILTER Match pattern 'app.*' doesn't match any INPUT tags
  - [Line 8] INPUT tail missing Mem_Buf_Limit (OOM risk)

Warnings (2):
  - [Line 30] OUTPUT elasticsearch has hardcoded password (security risk)
  - [Line 12] INPUT tail missing DB file (no crash recovery)

Info (1):
  - [Line 3] SERVICE Flush interval is 10s (consider reducing for lower latency)

Best Practices (2):
  - Consider enabling HTTP_Server for health checks
  - Consider enabling compression on OUTPUT elasticsearch
2. Categorize by severity:
  • Errors (must fix): Configuration won't work, Fluent Bit won't start
  • Warnings (should fix): Configuration works but has issues
  • Info (consider): Optimization opportunities
  • Best Practices: Recommended improvements
3. Propose specific fixes:
ini
undefined
验证完成后,呈现全面的检查结果:
1. 汇总所有问题:
fluent-bit.conf验证报告
=====================================

错误(3项):
  - [第15行] OUTPUT elasticsearch缺失必填参数'Host'
  - [第25行] FILTER的Match模式'app.*'与任何INPUT的Tag都不匹配
  - [第8行] INPUT tail缺失Mem_Buf_Limit(存在内存耗尽风险)

警告(2项):
  - [第30行] OUTPUT elasticsearch存在硬编码密码(安全风险)
  - [第12行] INPUT tail缺失DB文件(无崩溃恢复功能)

信息(1项):
  - [第3行] SERVICE的Flush间隔为10秒(考虑缩短以降低延迟)

最佳实践建议(2项):
  - 考虑启用HTTP_Server用于健康检查
  - 考虑在OUTPUT elasticsearch上启用压缩
2. 按严重程度分类:
  • 错误(必须修复): 配置无法运行,Fluent Bit无法启动
  • 警告(应该修复): 配置可运行但存在问题
  • 信息(可考虑): 优化机会
  • 最佳实践: 推荐的改进项
3. 提出具体修复方案:
ini
undefined

Fix 1: Add missing Host parameter

修复1:添加缺失的Host参数

[OUTPUT] Name es Match * Host elasticsearch.logging.svc # Added Port 9200
[OUTPUT] Name es Match * Host elasticsearch.logging.svc # 新增 Port 9200

Fix 2: Add Mem_Buf_Limit to prevent OOM

修复2:添加Mem_Buf_Limit防止内存耗尽

[INPUT] Name tail Tag kube.* Path /var/log/containers/*.log Mem_Buf_Limit 50MB # Added
[INPUT] Name tail Tag kube.* Path /var/log/containers/*.log Mem_Buf_Limit 50MB # 新增

Fix 3: Use environment variable for password

修复3:使用环境变量存储密码

[OUTPUT] Name es HTTP_User admin HTTP_Passwd ${ES_PASSWORD} # Changed from hardcoded

**4. Get user approval** via AskUserQuestion

**5. Apply approved fixes** using Edit tool

**6. Re-run validation** to confirm

**7. Provide completion summary:**
✅ Validation Complete - 5 issues fixed
Fixed Issues:
  • fluent-bit.conf:15 - Added missing Host parameter to OUTPUT elasticsearch
  • fluent-bit.conf:8 - Added Mem_Buf_Limit 50MB to INPUT tail
  • fluent-bit.conf:30 - Changed hardcoded password to environment variable
  • fluent-bit.conf:12 - Added DB file for crash recovery
  • fluent-bit.conf:25 - Fixed FILTER Match pattern to match INPUT tags
Validation Status: All checks passed ✅
  • Structure: Valid
  • Syntax: Valid
  • Tags: Consistent
  • Security: No issues
  • Performance: Optimized
  • Best Practices: Compliant
  • Dry-run: Passed (if applicable)

**8. Report-only summary (when user declines fixes):**

If user chooses not to apply fixes, provide a report-only summary:
📋 Validation Report Complete - No fixes applied
Summary:
  • Errors: 2 (must fix before deployment)
  • Warnings: 16 (should fix)
  • Info: 15 (optimization suggestions)
Critical Issues Requiring Attention:
  • [Line 5] Invalid Log_Level 'invalid_level'
  • [Line 52] [OUTPUT opentelemetry] missing required parameter 'Host'
Recommendations:
  • Review the errors above before deploying this configuration
  • Consider addressing warnings to improve reliability and security
  • Run validation again after manual fixes: python3 scripts/validate_config.py --file <config> --check all
undefined
[OUTPUT] Name es HTTP_User admin HTTP_Passwd ${ES_PASSWORD} # 从硬编码修改为环境变量

**4. 通过AskUserQuestion获取用户批准**

**5. 使用Edit工具应用已批准的修复**

**6. 重新运行验证以确认问题已解决**

**7. 提供完成总结:**
✅ 验证完成 - 已修复5项问题
已修复问题:
  • fluent-bit.conf:15 - 为OUTPUT elasticsearch添加缺失的Host参数
  • fluent-bit.conf:8 - 为INPUT tail添加Mem_Buf_Limit 50MB
  • fluent-bit.conf:30 - 将硬编码密码改为环境变量
  • fluent-bit.conf:12 - 添加DB文件用于崩溃恢复
  • fluent-bit.conf:25 - 修改FILTER的Match模式以匹配INPUT的Tag
验证状态:所有检查通过 ✅
  • 结构:有效
  • 语法:有效
  • Tag:一致
  • 安全:无问题
  • 性能:已优化
  • 最佳实践:合规
  • 试运行:通过(若适用)

**8. 仅报告总结(当用户拒绝修复时):**

若用户选择不应用修复,提供仅报告的总结:
📋 验证报告完成 - 未应用任何修复
总结:
  • 错误:2项(部署前必须修复)
  • 警告:16项(应该修复)
  • 信息:15项(优化建议)
需重点关注的关键问题:
  • [第5行] Log_Level值'invalid_level'无效
  • [第52行] [OUTPUT opentelemetry]缺失必填参数'Host'
建议:
  • 部署前修复上述错误
  • 考虑处理警告以提升可靠性和安全性
  • 手动修复后重新运行验证:python3 scripts/validate_config.py --file <config> --check all
undefined

Common Issues and Solutions

常见问题与解决方案

Configuration Errors

配置错误

Issue: Parser file not found
[error] [config] parser file 'parsers.conf' not found
Solution:
  • Verify Parsers_File path in SERVICE section
  • Check if file exists at specified location
  • Use relative path from config file location
Issue: Missing required parameter
[error] [output:es] property 'Host' not set
Solution:
  • Add required parameter to OUTPUT section
  • Check documentation for required fields
Issue: Invalid plugin name
[error] [plugins] invalid plugin 'unknownplugin'
Solution:
  • Check plugin name spelling
  • Verify plugin is available (may need installation)
  • Consult documentation for correct plugin names
问题:解析器文件未找到
[error] [config] parser file 'parsers.conf' not found
解决方案:
  • 验证SERVICE区段中的Parsers_File路径
  • 检查指定位置是否存在该文件
  • 使用相对于配置文件位置的相对路径
问题:缺失必填参数
[error] [output:es] property 'Host' not set
解决方案:
  • 为OUTPUT区段添加必填参数
  • 查阅文档确认必填字段
问题:插件名称无效
[error] [plugins] invalid plugin 'unknownplugin'
解决方案:
  • 检查插件名称拼写
  • 验证插件是否可用(可能需要安装)
  • 查阅文档获取正确的插件名称

Tag Routing Issues

Tag路由问题

Issue: No logs reaching output
undefined
问题:无日志到达输出
undefined

Logs are generated but don't appear in output

日志已生成但未出现在输出中

Debug:
1. Check INPUT Tag matches FILTER Match
2. Check FILTER Match/tag_prefix matches OUTPUT Match
3. Enable debug logging: `Log_Level debug`
4. Check for grep filters excluding all logs

Solution:
```ini
[INPUT]
    Tag    kube.*

[FILTER]
    Match  kube.*    # Must match INPUT Tag

[OUTPUT]
    Match  kube.*    # Must match INPUT or FILTER tag
调试步骤:
1. 检查INPUT的Tag是否与FILTER的Match匹配
2. 检查FILTER的Match/tag_prefix是否与OUTPUT的Match匹配
3. 启用调试日志:`Log_Level debug`
4. 检查是否有grep过滤器排除了所有日志

解决方案:
```ini
[INPUT]
    Tag    kube.*

[FILTER]
    Match  kube.*    # 必须与INPUT的Tag匹配

[OUTPUT]
    Match  kube.*    # 必须与INPUT或FILTER的Tag匹配

Memory Issues

内存问题

Issue: Fluent Bit OOM killed
undefined
问题:Fluent Bit因内存耗尽被终止
undefined

Container or process killed due to memory

容器或进程因内存问题被终止

Solution:
- Add Mem_Buf_Limit to all tail inputs
- Reduce Mem_Buf_Limit values
- Set storage.total_limit_size on outputs
- Increase Flush interval (batch more)
- Add log filtering to reduce volume
解决方案:
- 为所有tail输入添加Mem_Buf_Limit
- 降低Mem_Buf_Limit的值
- 为输出设置storage.total_limit_size
- 增大Flush间隔(批量处理更多日志)
- 添加日志过滤以减少日志量

Security Issues

安全问题

Issue: Hardcoded credentials in config
[OUTPUT]
    HTTP_Passwd  secretpassword
Solution:
  • Use environment variables:
ini
[OUTPUT]
    HTTP_Passwd  ${ES_PASSWORD}
  • Mount secrets in Kubernetes
  • Use IAM roles for cloud services (AWS, GCP, Azure)
Issue: TLS disabled or not verified
[OUTPUT]
    tls On
    tls.verify Off
Solution:
  • Enable verification for production:
ini
[OUTPUT]
    tls         On
    tls.verify  On
    tls.ca_file /path/to/ca.crt
问题:配置中存在硬编码凭证
[OUTPUT]
    HTTP_Passwd  secretpassword
解决方案:
  • 使用环境变量:
ini
[OUTPUT]
    HTTP_Passwd  ${ES_PASSWORD}
  • 在Kubernetes中挂载密钥
  • 为云服务使用IAM角色(AWS、GCP、Azure)
问题:TLS已禁用或未验证
[OUTPUT]
    tls On
    tls.verify Off
解决方案:
  • 生产环境中启用验证:
ini
[OUTPUT]
    tls         On
    tls.verify  On
    tls.ca_file /path/to/ca.crt

Integration with fluentbit-generator

与fluentbit-generator的集成

This validator is automatically invoked by the fluentbit-generator skill after generating configurations. It can also be used standalone to validate existing configurations.
Generator workflow:
  1. Generate configuration using fluentbit-generator
  2. Automatically validate using fluentbit-validator
  3. Fix any issues found
  4. Re-validate until all checks pass
  5. Deploy with confidence
在fluentbit-generator工具生成配置后,会自动调用本验证工具。它也可单独用于验证现有配置。
生成器工作流:
  1. 使用fluentbit-generator生成配置
  2. 自动使用fluentbit-validator进行验证
  3. 修复发现的问题
  4. 重新验证直至所有检查通过
  5. 放心部署

Resources

资源

scripts/

scripts/

validate_config.py
  • Main validation script with all checks integrated in a single file
  • Usage:
    python3 scripts/validate_config.py --file <config> --check <type>
  • Available check types:
    all
    ,
    structure
    ,
    syntax
    ,
    sections
    ,
    tags
    ,
    security
    ,
    performance
    ,
    best-practices
    ,
    dry-run
  • Comprehensive 1000+ line validator covering all validation stages
  • Includes syntax validation, section validation, tag consistency, security audit, performance analysis, and best practices
  • Returns detailed error messages with line numbers
  • Supports JSON output format:
    --json
validate.sh
  • Convenience wrapper script for easier invocation
  • Usage:
    bash scripts/validate.sh <config-file>
  • Automatically calls validate_config.py with proper Python interpreter
  • Simplifies command-line usage
validate_config.py
  • 主验证脚本,所有检查都集成在单个文件中
  • 使用方式:
    python3 scripts/validate_config.py --file <config> --check <type>
  • 可用的检查类型:
    all
    ,
    structure
    ,
    syntax
    ,
    sections
    ,
    tags
    ,
    security
    ,
    performance
    ,
    best-practices
    ,
    dry-run
  • 包含1000+行代码的全面验证器,覆盖所有验证阶段
  • 包含语法验证、区段验证、Tag一致性检查、安全审计、性能分析和最佳实践检查
  • 返回带有行号的详细错误信息
  • 支持JSON输出格式:
    --json
validate.sh
  • 便于调用的包装脚本
  • 使用方式:
    bash scripts/validate.sh <config-file>
  • 自动调用validate_config.py并使用合适的Python解释器
  • 简化命令行使用

tests/

tests/

Test Configuration Files:
  • valid-basic.conf
    - Valid basic Kubernetes logging setup
  • valid-multioutput.conf
    - Valid configuration with multiple outputs
  • valid-opentelemetry.conf
    - Valid OpenTelemetry output configuration (Fluent Bit 2.x+)
  • invalid-missing-required.conf
    - Missing required parameters
  • invalid-security-issues.conf
    - Security vulnerabilities (hardcoded credentials, disabled TLS)
  • invalid-opentelemetry.conf
    - OpenTelemetry configuration errors
  • invalid-tag-mismatch.conf
    - Tag routing issues
Running Tests:
bash
undefined
测试配置文件:
  • valid-basic.conf
    - 有效的基础Kubernetes日志收集配置
  • valid-multioutput.conf
    - 带有多个输出的有效配置
  • valid-opentelemetry.conf
    - 有效的OpenTelemetry输出配置(Fluent Bit 2.x+)
  • invalid-missing-required.conf
    - 缺失必填参数的配置
  • invalid-security-issues.conf
    - 存在安全漏洞的配置(硬编码凭证、禁用TLS)
  • invalid-opentelemetry.conf
    - 存在错误的OpenTelemetry配置
  • invalid-tag-mismatch.conf
    - 存在Tag路由问题的配置
运行测试:
bash
undefined

Test on valid config

测试有效配置

python3 scripts/validate_config.py --file tests/valid-basic.conf
python3 scripts/validate_config.py --file tests/valid-basic.conf

Test on invalid config (should report errors)

测试无效配置(应报告错误)

python3 scripts/validate_config.py --file tests/invalid-security-issues.conf
python3 scripts/validate_config.py --file tests/invalid-security-issues.conf

Test all configs

测试所有配置

for config in tests/*.conf; do echo "Testing $config" python3 scripts/validate_config.py --file "$config" done
undefined
for config in tests/*.conf; do echo "Testing $config" python3 scripts/validate_config.py --file "$config" done
undefined

Documentation Sources

文档来源

Based on comprehensive research from:
基于以下全面研究资料: