5 Apache Kafka Security Best Practices
Apache Kafka is a powerful stream processing application that can be found at the heart of the largest data warehouses around the world. Responsible for the heavy lifting from data sources to data sinks, Apache Kafka is capable processing millions of records or message per second while still maintaining sub second end to end latency. However, this is only possible if we keep our Apache Kafka Clusters along with their consumers and produce secured.
In this post we will discuss some standard Apache Kafka security best practices to help us do exactly that, including recommendations for authentication, encryption, updates, access control lists, and more.
Getting Started with Apache Kafka
Some of the information discussed in this blog does have some prerequisite concepts that will be helpful for the reader to be familiar with. Having a familiarity with basic Kafka concepts such as Topics, Partitions, Consumer Groups etc. will be very handy. I recommend visiting our Kafka 101 resource hub, downloading The Decision Maker's Guide to Apache Kafka, or watching the video below that covers some basic tips for configuring and testing your deployments.
5 Apache Kafka Security Best Practices
“Out of the box” default configurations are great for prototyping and proof-of-concept designs, but to really unleash the performance and reliability of Kafka we must keep it secure.
We all know the easiest way to stand up any enterprise resource is to just run with its default configuration. Or, maybe bypassing encryption like TLS, or skipping hardening processes like SELinux. But if we can’t guarantee the integrity of our services, then we cannot provide the reliability of services and accuracy of data required for enterprise operations.
With that in mind, here are a few of the basic Apache Kafka security best practices your organization should be addressing in your Apache Kafka environments.
1. Authenticate Everything
An often ignored security practice we find when doing Kafka environment analysis for customers is client authentication. Many organizations fail to authenticate their Kafka producers and consumers, which may be understandable in some contexts, but with strong support for multiple SASL offerings — including SASL/GSSAPI (Kerberos), SASL/OAUTHBEARER, SASL/SCRAM-SHA-256, SASL/SCRAM-SHA-512, and SASL/PLAIN — securing your cluster to only talk with authenticated clients is fairly straightforward, and definitely worth an organizations time.
It's unfortunately common that organization will spend a lot of time securing its external attack vectors, while ignoring their internal attack surfaces. This creates a hardened outer shell, but leaves the inside “gooey” and easily compromised from internal threats. Adding client authentication is a major step in hardening that “gooey” center.
In addition to authenticating clients we should be authenticating broker communications to Zookeeper as well. Starting with version 3.5.6 of Zookeeper that is shipped with Kafka 2.4, support for mutual TLS (mTLS) was implemented. Authentication support was expanded to include SASL mechanisms starting with Kafka 2.5.
2. Encrypt Everything
The availability of free and easy to implement cryptography is ubiquitous in the modern enterprise. With freely available PKI solutions that either self-sign or utilize free services like LetsEncrypt, there is very little reason to have non-encrypt traffic crossing our network infrastructure. While disabled by default, encrypting communications on your Kafka Cluster is fairly easy and helps ensure the integrity of your cluster.
Keep in mind, there are performance considerations in regard to CPU and JVM implementations when enabling cluster encryption, but the benefits of enabling encryption will almost always outweigh the performance considerations.
Also keep in mind that some older clients do not have support for encryption, and require that versions 0.9.0 and higher of the consumer and producer API be utilized (which brings us to our next security best practice).
3. Update Regularly
While I would consider this a “performance tuning” best practice as well (and it is definitely applicable to more than just Kafka), keeping your software updated with the most recent bug and security fixes is a must.
We are all too familiar with looking out over our enterprise services and feeling that sinking feeling in the pit in our stomachs when anyone even mentions the word “upgrade,” but it's paramount that updates get done in a timely manner. To make that sinking “pit of doom” feeling a little less pronounced, have an upgrade plan. You should have both a long-term and short-term upgrade plan within your organization.
What versions will you be running 3-4 months?
How about in 6-12 months?
18-24 months?
These plans should not only include your infrastructure and DevOps folks, but your development teams as well. The responsibility for maintaining Kafka infrastructure like brokers, ZooKeeper, etc., and the responsibility for maintaining consumer and producer code will probably fall across multiple teams or groups. The people who are responsible for upgrading your cluster very likely won't be responsible for maintaining your producer or consumer code. Coordinating your upgrade plans between these two groups is crucial as there can be braking changes in Kafka versions that require changes to the producer or consumer code.
Getting these changes scheduled into your developer’s sprints ahead of time will take out a lot of the heartburn of upgrading your Kafka infrastructure.
4. Enable and Configure Access Control Lists
Now that we have authentication and encryption enabled as well as running the latest versions, we want to make sure that we are securing “who” (consumers and producers) is talking to "what" (topics, broker configurations, etc.).
To do this organizations should be enabling and configuring access control list (ACL). ACLs control a number of client operations, such as creating, deleting, or altering the configurations of topics, reading or writing events to q topic, and even managing the creation and deletion of ACLs for a topic. This step is a must in almost all multi-tenant environments, but even single-tenant environments will benefit from implementing ACLs.
Kafka utilizes a pluggable Authorizer and an out of the box Authorizer that leverages Zookeeper to store ACLs. You can change the Authorizer in server.properties. Kafka ACLs are defined with a general format of “Identity ‘A’ is (allowed/denied) Operation ‘B’ from host ‘C’ on any resource ‘D’ matching resource pattern ‘E’." By default, if no resource pattern matches a given resource then that resource is considered to not have any ACLs and can only be access by super users. To add, remove, or list ACLs a Kafka authorizer CLI is provided.
5. Do the Hard Things
It’s all too easy to take shortcuts when it comes to security, but putting in the extra time and effort up front can save you time and heartache down the road.
I find that SELinux is commonly thrown into this category. It is so tempting to just configure SELinux into permissive mode (or disabled altogether) but this hamstrings your operating systems strongest hardening tool and defense. With the ability to help protect us from threats we are not even aware of, taking the time and effort to properly configure custom SELinux policies for our production workloads needs to be done. This should apply to any production workload, not just Kafka.
While this can be a frustrating process, tools are available to help in configuring the proper policies to allow workloads like Kafka or Zookeeper to have access to the appropriate server resources. Tools like audit2allow, sealert and auditd logs can go a long way to eliminating that frustration. It might not be nearly as easy as just running “setenforce 0” but I know I sleep a lot better at night when “getenforce” returns “enforcing”.
Back to top
Final Thoughts
We all know it's so much easier to NOT implement encryption or NOT implement authentication. And the fact that Kafka ships out the door with these things disabled makes NOT doing these things even more tempting. Running a simple one-line command to eliminate a problem caused by SELinux is so much simpler than using tools like audit2allow to craft custom policy.
Doing the hard things, however, is how we do our job correctly. Sure, there are use cases and contexts where we can skimp on the security details but when comes to our enterprise level workloads we have to take the time to do things correctly. Considering how costly a data breach can be to both reputation and the company’s bottom line, it's always worth it to the do the hard things.
Need Help With Your Kafka Deployments?
OpenLogic provides expert technical support that helps ensure your deployments are secure and performant. Talk to an open source expert today to learn more about how we can support your Kafka deployments.
Additional Resources
- White Paper - Decision Maker's Guide to Apache Kafka
- Blog - Get Ready for Kafka 4: Changes and Upgrade Considerations
- Blog - Kafka Raft Mode: Running Kafka Without ZooKeeper
- Blog - How to Develop a Winning Kafka Partition Strategy
- Blog - Using Apache Kafka for Stream Processing
- Case Study - Credit Card Processing Company Avoids Kafka Exploit
- Training Course - Building, Maintaining, and Monitoring Applications With Apache Kafka
- Blog - Exploring Kafka Connect