kafka-expert

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Apache Kafka Expert

Apache Kafka 专业指南

Expert guidance for Apache Kafka, event streaming, Kafka Streams, and building event-driven architectures.
为Apache Kafka、事件流、Kafka Streams以及事件驱动架构构建提供专业指导。

Core Concepts

核心概念

  • Topics, partitions, and offsets
  • Producers and consumers
  • Consumer groups
  • Kafka Streams
  • Kafka Connect
  • Exactly-once semantics
  • Topics、分区、偏移量
  • Producer、Consumer
  • 消费者组
  • Kafka Streams
  • Kafka Connect
  • 精确一次语义

Producer

Producer

python
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',  # Wait for all replicas
    retries=3
)
python
from kafka import KafkaProducer
import json

producer = KafkaProducer(
    bootstrap_servers=['localhost:9092'],
    value_serializer=lambda v: json.dumps(v).encode('utf-8'),
    acks='all',  # Wait for all replicas
    retries=3
)

Send message

Send message

future = producer.send('user-events', { 'user_id': '123', 'event': 'login', 'timestamp': '2024-01-01T00:00:00Z' })
future = producer.send('user-events', { 'user_id': '123', 'event': 'login', 'timestamp': '2024-01-01T00:00:00Z' })

Wait for acknowledgment

Wait for acknowledgment

record_metadata = future.get(timeout=10) print(f"Topic: {record_metadata.topic}, Partition: {record_metadata.partition}")
producer.flush() producer.close()
undefined
record_metadata = future.get(timeout=10) print(f"Topic: {record_metadata.topic}, Partition: {record_metadata.partition}")
producer.flush() producer.close()
undefined

Consumer

Consumer

python
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'user-events',
    bootstrap_servers=['localhost:9092'],
    group_id='my-group',
    auto_offset_reset='earliest',
    enable_auto_commit=False,
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    print(f"Received: {message.value}")

    # Process message
    process_event(message.value)

    # Manual commit
    consumer.commit()
python
from kafka import KafkaConsumer

consumer = KafkaConsumer(
    'user-events',
    bootstrap_servers=['localhost:9092'],
    group_id='my-group',
    auto_offset_reset='earliest',
    enable_auto_commit=False,
    value_deserializer=lambda m: json.loads(m.decode('utf-8'))
)

for message in consumer:
    print(f"Received: {message.value}")

    # Process message
    process_event(message.value)

    # Manual commit
    consumer.commit()

Kafka Streams

Kafka Streams

java
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> source = builder.stream("input-topic");

// Transform and filter
KStream<String, String> transformed = source
    .filter((key, value) -> value.length() > 10)
    .mapValues(value -> value.toUpperCase());

transformed.to("output-topic");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();
java
Properties props = new Properties();
props.put(StreamsConfig.APPLICATION_ID_CONFIG, "streams-app");
props.put(StreamsConfig.BOOTSTRAP_SERVERS_CONFIG, "localhost:9092");

StreamsBuilder builder = new StreamsBuilder();

KStream<String, String> source = builder.stream("input-topic");

// Transform and filter
KStream<String, String> transformed = source
    .filter((key, value) -> value.length() > 10)
    .mapValues(value -> value.toUpperCase());

transformed.to("output-topic");

KafkaStreams streams = new KafkaStreams(builder.build(), props);
streams.start();

Best Practices

最佳实践

  • Use appropriate partition keys
  • Monitor consumer lag
  • Implement idempotent producers
  • Use consumer groups for scaling
  • Set proper retention policies
  • Handle rebalancing gracefully
  • Monitor cluster metrics
  • 使用合适的分区键
  • 监控消费者滞后
  • 实现幂等生产者
  • 使用消费者组进行扩容
  • 设置合理的保留策略
  • 优雅处理重平衡
  • 监控集群指标

Anti-Patterns

反模式

❌ Single partition topics ❌ No error handling ❌ Ignoring consumer lag ❌ Producing to wrong partitions ❌ Not using consumer groups ❌ Synchronous processing ❌ No monitoring
❌ 单分区主题 ❌ 无错误处理 ❌ 忽略消费者滞后 ❌ 发送至错误分区 ❌ 不使用消费者组 ❌ 同步处理 ❌ 无监控

Resources

资源