kafka-perf-review

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Kafka Performance Configuration Review

Kafka性能配置审查

Reviews producer and consumer configurations in both the live cluster and the codebase for performance anti-patterns. These settings are the same across all Kafka client libraries (they're Kafka protocol properties).
审查实时集群和代码库中的生产者与消费者配置,排查性能反模式。这些设置在所有Kafka客户端库中通用(属于Kafka协议属性)。

Workflow

工作流程

Copy this checklist and track your progress:
Performance Review Progress:
- [ ] Step 1: Inspect live cluster configs
- [ ] Step 2: Scan codebase for producer/consumer configs
- [ ] Step 3: Audit producer configs
- [ ] Step 4: Audit consumer configs
- [ ] Step 5: Cross-reference cluster and code configs
- [ ] Step 6: Generate report
  1. Inspect live cluster configs via Lenses MCP
  2. Scan codebase for producer/consumer config properties (see
    references/producer-defaults.md
    and
    references/consumer-defaults.md
    )
  3. Audit producer configs against recommended values
  4. Audit consumer configs against recommended values
  5. Cross-reference cluster and code configs
  6. Report findings with current values, recommended values and trade-off explanations
复制以下检查清单并跟踪进度:
Performance Review Progress:
- [ ] Step 1: Inspect live cluster configs
- [ ] Step 2: Scan codebase for producer/consumer configs
- [ ] Step 3: Audit producer configs
- [ ] Step 4: Audit consumer configs
- [ ] Step 5: Cross-reference cluster and code configs
- [ ] Step 6: Generate report
  1. 通过Lenses MCP检查实时集群配置
  2. 扫描代码库中的生产者/消费者配置属性(参考
    references/producer-defaults.md
    references/consumer-defaults.md
  3. 审查生产者配置,与推荐值对比
  4. 审查消费者配置,与推荐值对比
  5. 交叉对比集群和代码中的配置
  6. 生成报告,包含当前值、推荐值以及权衡说明

Step 1: Live Cluster Inspection

步骤1:实时集群检查

Use Lenses MCP tools to check cluster-side performance configs:
  • get_topic
    - topic-level configs affecting performance (
    min.insync.replicas
    ,
    compression.type
    ,
    max.message.bytes
    )
  • get_topic_broker_configs
    - broker-level configs (
    message.max.bytes
    ,
    replica.fetch.max.bytes
    ,
    num.io.threads
    )
  • get_topic_partitions
    - message distribution across partitions (detect skew where one partition has significantly more bytes than others)
  • get_dataset_message_metrics
    - message throughput over time to identify bottlenecks or capacity headroom
Expected output: Topic-level performance configs, partition distribution and throughput metrics.
Validation: If MCP calls fail, proceed with codebase-only analysis and note the limitation in the report.
使用Lenses MCP工具检查集群端性能配置:
  • get_topic
    - 影响性能的主题级配置(
    min.insync.replicas
    compression.type
    max.message.bytes
  • get_topic_broker_configs
    - 代理级配置(
    message.max.bytes
    replica.fetch.max.bytes
    num.io.threads
  • get_topic_partitions
    - 消息在分区中的分布情况(检测是否存在某一分区字节量远高于其他分区的倾斜问题)
  • get_dataset_message_metrics
    - 一段时间内的消息吞吐量,用于识别瓶颈或容量余量
预期输出:主题级性能配置、分区分布情况及吞吐量指标。
验证:若MCP调用失败,则仅基于代码库进行分析,并在报告中注明此限制。

Step 2: Codebase Inspection

步骤2:代码库检查

Search the codebase for Kafka producer and consumer configuration properties. Consult
references/producer-defaults.md
for the full list of producer properties and
references/consumer-defaults.md
for consumer properties.
Also search for anti-patterns listed in
references/producer-defaults.md
:
  • Synchronous produce calls (
    .get()
    ,
    .result()
    ,
    flush()
    after every send)
  • Missing delivery callbacks / error handlers
  • Missing graceful shutdown / rebalance listeners
在代码库中搜索Kafka生产者和消费者配置属性。参考
references/producer-defaults.md
获取完整的生产者属性列表,参考
references/consumer-defaults.md
获取消费者属性列表。
同时搜索
references/producer-defaults.md
中列出的反模式:
  • 同步生产调用(每次发送后调用
    .get()
    .result()
    flush()
  • 缺失交付回调/错误处理器
  • 缺失优雅关闭/重平衡监听器

Step 3: Audit Producer Configs

步骤3:生产者配置审查

Compare found producer configs against the recommended values in
references/producer-defaults.md
. Key areas:
acks
,
batch.size
,
linger.ms
,
compression.type
,
enable.idempotence
and
retries
.
将找到的生产者配置与
references/producer-defaults.md
中的推荐值对比。重点关注:
acks
batch.size
linger.ms
compression.type
enable.idempotence
retries

Step 4: Audit Consumer Configs

步骤4:消费者配置审查

Compare found consumer configs against the recommended values in
references/consumer-defaults.md
. Key areas:
max.poll.records
,
max.poll.interval.ms
,
auto.offset.reset
,
enable.auto.commit
and
fetch.min.bytes
.
将找到的消费者配置与
references/consumer-defaults.md
中的推荐值对比。重点关注:
max.poll.records
max.poll.interval.ms
auto.offset.reset
enable.auto.commit
fetch.min.bytes

Success Criteria

成功标准

Quantitative

量化标准

  • Triggers on 90% of performance-related queries (test with 10-20 varied phrasings)
  • Completes review in under 15 tool calls (MCP + codebase search)
  • 0 failed MCP calls per run
  • 响应90%的性能相关查询(用10-20种不同表述测试)
  • 在15次工具调用内完成审查(MCP + 代码库搜索)
  • 每次运行无MCP调用失败

Qualitative

定性标准

  • Every finding shows current value, recommended value and trade-off explanation
  • Anti-patterns are identified with file and line references
  • Estimated throughput impact (low/medium/high) is consistently calibrated
  • 每个发现都包含当前值、推荐值及权衡说明
  • 反模式标记包含文件和行号引用
  • 吞吐量影响评估(低/中/高)保持一致校准

Examples

示例

Example 1: Routine performance review

示例1:常规性能审查

User says: "Review Kafka performance configs for staging"
Actions:
  1. Inspect cluster-side configs for all topics in staging
  2. Scan
    src/
    for producer/consumer property definitions
  3. Cross-reference code configs against reference tables Result: Report with per-property findings and throughput impact estimates
用户提问:"审查 staging 环境的Kafka性能配置"
操作:
  1. 检查staging环境下所有主题的集群端配置
  2. 扫描
    src/
    目录中的生产者/消费者属性定义
  3. 将代码配置与参考表交叉对比 结果:包含各属性发现及吞吐量影响预估的报告

Example 2: Investigating slow consumers

示例2:排查消费者缓慢问题

User says: "Why are my consumers slow? Check the performance settings."
Actions:
  1. Focus on consumer config properties in the codebase
  2. Check
    max.poll.records
    ,
    fetch.min.bytes
    and
    enable.auto.commit
  3. Look for anti-patterns like synchronous processing Result: Targeted report on consumer-side bottlenecks with remediation steps
用户提问:"为什么我的消费者速度慢?检查一下性能设置。"
操作:
  1. 重点关注代码库中的消费者配置属性
  2. 检查
    max.poll.records
    fetch.min.bytes
    enable.auto.commit
  3. 查找同步处理等反模式 结果:针对消费者端瓶颈的定向报告及修复步骤

Example 3: Scoped codebase review

示例3:限定范围的代码库审查

User says: "Check Kafka configs in src/kafka/ for the production environment"
Actions:
  1. Scan only
    src/kafka/
    for producer and consumer configs
  2. Cross-reference with live production cluster settings Result: Focused report on a specific directory's Kafka configurations
用户提问:"检查生产环境下src/kafka/目录中的Kafka配置"
操作:
  1. 仅扫描
    src/kafka/
    目录中的生产者和消费者配置
  2. 与生产环境实时集群设置交叉对比 结果:针对特定目录Kafka配置的聚焦报告

Troubleshooting

故障排除

No Kafka config properties found in codebase

代码库中未找到Kafka配置属性

Cause: The codebase may use a framework or wrapper that hides raw Kafka properties. Solution: Search for framework-specific config patterns (e.g., Spring Boot
application.yml
, Django settings). Report the framework used and suggest manual review.
原因:代码库可能使用了隐藏原始Kafka属性的框架或封装层。 解决方案:搜索框架特定的配置模式(如Spring Boot的
application.yml
、Django设置)。报告使用的框架并建议手动审查。

Lenses MCP returns no topic data

Lenses MCP未返回主题数据

Cause: Environment name is incorrect or Lenses agent is offline. Solution: Run
check_environment_health
first. Verify the environment name matches what
list_environments
returns.
原因:环境名称不正确或Lenses代理离线。 解决方案:先运行
check_environment_health
。验证环境名称与
list_environments
返回的名称一致。

Partition skew detection is inconclusive

分区倾斜检测结果不明确

Cause: Topic has very low throughput so byte counts are similar across partitions. Solution: Note that skew detection requires meaningful throughput. For low-volume topics, skip the skew check and note it in the report.
原因:主题吞吐量极低,导致各分区字节量相近。 解决方案:注明倾斜检测需要足够的吞吐量。对于低流量主题,跳过倾斜检查并在报告中说明。

Output Format

输出格式

undefined
undefined

Performance Review Report

Performance Review Report

Cluster-Side Findings

Cluster-Side Findings

  • [topic-name] {property}: {current value} Recommendation: {recommended value} - {explanation}
  • [topic-name] {property}: {current value} Recommendation: {recommended value} - {explanation}

Codebase Findings (Producers)

Codebase Findings (Producers)

  • [file:line] {property} = {current value} Recommendation: {recommended value} - {explanation}
  • [file:line] {property} = {current value} Recommendation: {recommended value} - {explanation}

Codebase Findings (Consumers)

Codebase Findings (Consumers)

  • [file:line] {property} = {current value} Recommendation: {recommended value} - {explanation}
  • [file:line] {property} = {current value} Recommendation: {recommended value} - {explanation}

Anti-Patterns

Anti-Patterns

  • [file:line] Description of the anti-pattern Recommendation: How to fix it
  • [file:line] Description of the anti-pattern Recommendation: How to fix it

Summary

Summary

  • X producer issues found
  • Y consumer issues found
  • Z anti-patterns found
  • Estimated throughput impact: low/medium/high
undefined
  • X producer issues found
  • Y consumer issues found
  • Z anti-patterns found
  • Estimated throughput impact: low/medium/high
undefined