Skip to content

Apache Kafka: Distributed Event Streaming

Apache Kafka: Distributed Event Streaming

Kafka is designed to handle trillions of events a day. It is an immutable, distributed, partitioned, and replicated commit log service.

🏗️ Core Architecture

  • Broker: A Kafka server that stores data.
  • Topic: A category or feed name where records are stored.
  • Partition: Topics are divided into partitions for scalability and parallelism.
  • Segment: The actual file on disk where data is written.

🛠️ How it Works

  1. Producers send records to topics.
  2. Kafka appends the record to the end of a partition.
  3. Consumers read records from partitions and track their own Offset (position).

📉 Consumer Groups

Kafka allows multiple consumers to read from the same topic by grouping them.

  • Each consumer in a group reads from a unique set of partitions.
  • If a consumer fails, its partitions are reassigned to other members (Rebalancing).

📊 Key Guarantees

  • Ordering: Kafka guarantees the order of messages within a partition.
  • Persistence: Messages are written to disk and replicated.
  • Replayability: Consumers can reset their offset to re-process old data.

💡 Best Use Cases

  • Event Sourcing: Store the history of state changes.
  • Log Aggregation: Collect logs from hundreds of services.
  • Stream Processing: Real-time data transformation with Kafka Streams or Flink.