Apache Kafka: Distributed Event Streaming
Apache Kafka: Distributed Event Streaming
Kafka is designed to handle trillions of events a day. It is an immutable, distributed, partitioned, and replicated commit log service.
🏗️ Core Architecture
- Broker: A Kafka server that stores data.
- Topic: A category or feed name where records are stored.
- Partition: Topics are divided into partitions for scalability and parallelism.
- Segment: The actual file on disk where data is written.
🛠️ How it Works
- Producers send records to topics.
- Kafka appends the record to the end of a partition.
- Consumers read records from partitions and track their own Offset (position).
📉 Consumer Groups
Kafka allows multiple consumers to read from the same topic by grouping them.
- Each consumer in a group reads from a unique set of partitions.
- If a consumer fails, its partitions are reassigned to other members (Rebalancing).
📊 Key Guarantees
- Ordering: Kafka guarantees the order of messages within a partition.
- Persistence: Messages are written to disk and replicated.
- Replayability: Consumers can reset their offset to re-process old data.
💡 Best Use Cases
- Event Sourcing: Store the history of state changes.
- Log Aggregation: Collect logs from hundreds of services.
- Stream Processing: Real-time data transformation with Kafka Streams or Flink.