Skip to content

Apache Kafka: Distributed Event Streaming

Apache Kafka: Distributed Event Streaming

Kafka is designed to handle trillions of events a day. It is an immutable, distributed, partitioned, and replicated commit log service.

🏗️ Core Architecture

Broker: A Kafka server that stores data.
Topic: A category or feed name where records are stored.
Partition: Topics are divided into partitions for scalability and parallelism.
Segment: The actual file on disk where data is written.

🛠️ How it Works

Producers send records to topics.
Kafka appends the record to the end of a partition.
Consumers read records from partitions and track their own Offset (position).

📉 Consumer Groups

Kafka allows multiple consumers to read from the same topic by grouping them.

Each consumer in a group reads from a unique set of partitions.
If a consumer fails, its partitions are reassigned to other members (Rebalancing).

📊 Key Guarantees

Ordering: Kafka guarantees the order of messages within a partition.
Persistence: Messages are written to disk and replicated.
Replayability: Consumers can reset their offset to re-process old data.

💡 Best Use Cases

Event Sourcing: Store the history of state changes.
Log Aggregation: Collect logs from hundreds of services.
Stream Processing: Real-time data transformation with Kafka Streams or Flink.