Introduction
Understanding how events are stored and processed in Kafka topics is crucial for maintaining correct data consistency and order. In this tutorial, we will explore how Kafka handles event storage across partitions and how message keys can help maintain event order.
Event Storage in Kafka Topics
Kafka topics store published messages, which are distributed across multiple partitions. The way messages are assigned to partitions impacts how they are consumed and processed.
Example: Profile Updated Event Topic
Consider a topic named profile-updated-event-topic, where an event is published each time a user updates their profile.
- The user corrects the spelling of their first name and clicks Update.
- A new event is published and stored in partition 0.
- The new name includes an extra letter “I” at the end.
- The user modifies their name again and clicks Update.
- A new event is published and stored in partition 1.
- The user changes their name once more and submits another update.
- The event is now stored in partition 2.
At this point, different updates are scattered across different partitions, potentially disrupting the order in which updates are applied.
How Kafka Assigns Events to Partitions
Kafka messages are stored as key-value pairs:
- The message key is used to determine the partition.
- The message value contains the event details (e.g., a JSON payload).
When a message key is not provided, Kafka distributes events across partitions randomly, balancing the load among available partitions. However, this can lead to problems when event order matters.
Why Event Order Matters
Kafka consumers process messages in parallel across partitions. This means events may not always be consumed in the exact order they were published. For example, if the profile update events are stored in different partitions, the consumer might process them in any order:
- Scenario 1: Event A → Event C → Event B
- Scenario 2: Event B → Event A → Event C
- Scenario 3: Event C → Event B → Event A
If events related to a single entity (e.g., a user profile update) are processed out of order, the user’s profile data may become inconsistent in the database.
Using Message Keys to Maintain Order
To ensure event order, message keys should always be provided. Kafka uses the key to determine the partition where the event should be stored. Events with the same key will always be stored in the same partition, ensuring correct order.
How Message Keys Work
- Kafka hashes the message key.
- It uses the hash value to determine the partition.
- Events with the same message key are always stored in the same partition.
- Consumers reading from that partition will process them in order.
Example: Using User ID as a Message Key
If we use the User ID as the message key, all profile update events for a single user will be stored in the same partition:
- First update → Partition 1, Offset 0
- Second update → Partition 1, Offset 1
- Third update → Partition 1, Offset 2
Since all updates for a particular user are in the same partition, the consumer microservice will process them sequentially, maintaining the correct order.
Choosing a Message Key
A message key can be any value, such as:
- User ID (for user-related events)
- Product ID (for product-related events)
- Order ID (for e-commerce applications)
- UUID (a randomly generated unique identifier)
If a string-based message key is used, you can:
- Use an existing database ID.
- Generate a UUID.
- Create a custom key format based on business logic.
Grouping Related Events
Events related to the same entity should share the same message key, ensuring they are processed in the correct order. For example:
- All events for Product A → stored under one key.
- All events for Product B → stored under another key.
This approach ensures that events related to a specific entity are always processed in the correct order.