Blog

September 29, 2022

How to Develop a Winning Kafka Partition Strategy

Apache Kafka

For teams using Apache Kafka as a messaging system, Kafka partitions play a key role. Though technically a distributed streaming platform, Apache Kafka has excellent capabilities as a message broker, and understanding how topic partitions function within the greater architecture is essential.

In this blog, we’ll discuss Kafka partitions in more depth, with a focus on how to develop and implement a partitioning strategy based on your use case and application needs.

What Is a Kafka Partition?
How Are Kafka Partitions Used?
Common Kafka Partitioning Strategies
How to Pick the Right Kafka Partition Strategy
Final Thoughts

What Is a Kafka Partition?

In Apache Kafka, partitions are the main method of concurrency for topics. A topic, a dedicated location for events or messages, will be broken into multiple partitions among one or more Kafka brokers.

Before we jump into Kafka partitioning strategy, it’s helpful to have a high-level grasp of the structural mechanism for handling data (messages) in Kafka. Messages in Kafka are broken into topics, which are divided into partitions. As a distributed system, Kafka runs in a cluster, and each container in the cluster is called a broker. Partitioning is what enables messages to be split in parallel across several brokers in the cluster.

Using this method of parallelism, Kafka scales to support multiple consumers and producers simultaneously. This method of partitioning allows linear scaling for both consumers as well as producers. When partition replicas are introduced to the environment, this is how Kafka handles redundancy as well.

Choosing Kafka Is Easy. Deploying Kafka Can Be Difficult.
Get the Decision Maker's Guide to Apache Kafka for expert guidance on scaling Kafka, from security and cluster configuration to partition strategy and upgrade/migration best practices.
Read Now

How Are Kafka Partitions Used?

Kafka partitions work by creating multiple logs from a single topic log and spreading them across one or more brokers, as shown in the images below. As previously mentioned, partitions are what makes Kafka scalable. The number of partitions can be defined at topic creation, or they can be added using the Kafka kafka-topics.sh utility.

Many partitions equate to many clients being able to access a given topic. Keep in mind, however, while one consumer can work with many different partitions, partitions can work with a single consumer at a time. So the goal is to have parity between clients and partitions, as having more clients than partitions does not create an effective gain in throughput or performance.

Single Broker With Multiple Partitions

Multiple Partitions, Multiple Brokers, With Replica Factor 2

In the last image, each primary partition would be defined as the partition leader, with each replica defined as a follower. Producers will place messages onto the partition leader and the consumer will read from that same leader as well, while the replicas are utilized for redundancy. There is more than could be said around Kafka replication, redundancy, and in-sync replicas, but for the purposes of this article, just know that Kafka preserves data and achieves redundancy through partitions and their replicas.

Common Kafka Partitioning Strategies

There are two primary partition strategies for producers that organizations can utilize, and they each have their own benefits and drawbacks. We will discuss both these methods, as well as custom partitioning approaches.

Round Robin Partitioning

This partitioner class is the default partitioning method used when no message key is supplied. This partitioning strategy is generally utilized when messaging order is not important, and having the data evenly balanced across partitions and broker nodes is desirable. Stateless applications generally work well with round robin partitions as message order is usually not needed in stateless applications. If message order is required by the application, then round robin partitioning would not be a good option.

Message Key Partitioning

Also known as the Default Partitioner utilized in 2.4 and earlier, message key partitioning is used when a message key is provided. The key is placed through a hashing function and all messages with the same key are placed onto the same partition, preserving message order. This partitioning method is used when messaging order is important or, when messages should be grouped together. Multi-tenant environments are a common use case for this type of partitioning and messages are grouped together based on a customer key. Another thing to note is that this partitioning strategy can lead to partitions not being evenly distributed and more active messages keys having larger partitions than other less active messages keys.

Custom Partitioning

Custom partitioning is possible, as a Kafka producer can be written to follow any number of rules-based methodologies to assign messages to a particular partition. For instance, a partition can be specified in the record itself, or by utilizing the Partitioner interface provide in the kafka.clients.producer package. Most enterprise use cases will not require custom partitioning, but if you do go down this route, note that some changes have occurred between 2.4 and more recent versions of Kafka, such as 3.3. See KIP-794 for further information.

Our Kafka Expertise Runs Deep
Whether you need full-service Kafka management, LTS, or occasional technical support, OpenLogic can help.
See Kafka Solutions

How to Pick the Right Kafka Partition Strategy

Selecting the best partitioning method will largely depends on the needs of the application in question. If message ordering is important, then utilizing message key for partitioning would be preferable. If the application is stateless and message order is not required, then round robin partitioning would be fine. In the majority of use cases, one of these two strategies will be the method of choice. In situations where a different strategy needs to be defined, implementing a custom partitioning method is possible, but this will not be necessary for the vast majority of teams using Kafka as a message broker.

Final Thoughts

Topic partitions are at the heart of any Kafka strategy, environment design, and implementation. Choosing the partitioning method that fits your particular use case is the first step in ensuring long-term success with Kafka and benefitting from its remarkable durability and speed in applications.

For more Apache Kafka best practices for implementation and testing, check out the video below:

Additional Resources

Blog - Kafka Cluster Configuration Strategies
Guide - Enterprise Kafka Resources
Blog - Apache Kafka vs. Confluent Kafka
Blog - Solving Complex Kafka Issues: Enterprise Case Studies
Blog - Using Apache Kafka for Stream Processing
Webinar - Kafka Gone Wrong: How to Avoid Data Disasters
Training - Apache Kafka Training Course
Case Study - Credit Card Processing Company Avoids Kafka Exploit With Support From OpenLogic
Blog - 8 Kafka Security Best Practices

Featured Product

Spring LTS

Services

Training

Taking an Open Source Approach to Big Data Management