Apache Kafka has had a major impact in a short time.

With the project reporting more than 60% of Fortune 100 companies using it today to modernize their data architecture, Kafka has proven to be a popular event streaming platform across a range of industries. 

Apache Kafka is an open source software platform developed by the Apache Software Foundation that can publish, subscribe to, store, and process streams of records in real time. Some use cases of Apache Kafka include messaging, website activity tracking, metrics, and log aggregation.

Basically, if you want to move massive amounts of data quickly and at scale, you want Apache Kafka. But that’s not to say you won’t encounter some challenges when using it.

Let’s take a look at some of the common challenges of using Apache Kafka in this article—and what options are available to help you deal with them.

1) Apache Kafka is hard to set up (and learn)

Setting up and managing Apache Kafka no cakewalk.

Are you going to put it on a physical machine? A cloud? What are the considerations to think about for both? You have to figure out networking requirements, setting up the right interfaces, segregating in terms of security.

And then, when you have it all up and running, you still have to tackle Day 2 operations and know how to diagnose and resolve problems as they arise. It’s complicated.

2) Apache Kafka isn’t super developer-friendly

Developers new to Apache Kafka might find it difficult to grasp the concept of Kafka brokers, clusters, partitions, topics, logs, and so on. The learning curve is steep. You’ll need extensive training to learn Kafka’s basic foundations and the core elements of an event streaming architecture. 

To better understand this particular challenge, we first need to familiarize ourselves with how Kafka works. 

At a high level, Apache Kafka streams data from sources (called producers) to targets (called consumers). Producers can push data into (and consumers can pull data out of) what we call Kafka topics, where our data is stored and published.

Each piece of data in a Kafka topic is a key-value pair, where the value can be serialized into data formats like Avro, JSON, or Protobuf. The structure of the data format is what we call a schema

A problem developers might run into when trying to use different schemas to encode data is consumers being unable to understand producers: your downstream consumers will start breaking if the producer schema is different. In other words, there is no data verification in Kafka.

Another area where developers might have trouble working with Kafka is in protocol support. Because Kafka works in a Java Virtual Machine (JVM) ecosystem, the main programming language of the client is Java. This could be a problem if your preferred language is Python or C, for example. 

While there are open source clients available in other languages, these don’t come with Kafka itself. You’ll have to set up and update these drivers manually for full protocol support.

3) Spinning up Kafka connectors takes time and energy

To move large collections of data into and out of Apache Kafka, there’s a tool called Kafka Connect. Kafka Connect lets you run and build connectors (which tell you where data should be copied to and from) for your Kafka cluster. 

Sounds great, right?

The problem is, manually creating or running connectors for Kafka Connect clusters can take up a lot of operational bandwidth that you might not have. Your team will have to spin up these connectors, provision the right infrastructure for them, and generally deal with the day-to-day operations of the cluster. Focusing on larger business challenges becomes difficult when you and your teams have to deal with running and operating Kafka Connect on a rudimentary level.

What can help solve these challenges?

Red Hat Openshift Streams for Kafka is designed to alleviate the pain of doing all the administration work for Kafka.

With Openshift Streams for Kafka, you don’t have to worry about taking the time to get Kafka up and running. You can get started with Kafka right away with a free trial (no strings attached or credit card required) and connect to it from any application. 

And if you decide that you want to stick around with Openshift Streams for Apache Kafka, you also don’t need to worry about maintaining it all day. Red Hat's 24/7 global Site Reliability Engineering team fully manages daily operations, including:

  • Monitoring

  • Logging

  • Upgrades

  • And patching, to proactively address issues and quickly solve problems.

Red Hat Cloud Services delivers a streamlined developer experience and ensures consistency in data handling across applications. Red Hat OpenShift Service Registry, for example, enables development teams to publish, discover, and communicate using well-defined data schemas with Apache Kafka. 

Lastly, while Openshift Streams for Apache Kafka does not yet handle running and operating Kafka connectors, it is in the overall roadmap. Be sure to check the Openshift Streams for Apache Kafka homepage for updates and more resources.


关于作者

Bill Cozens is a recent UNC-Chapel Hill grad interning as an Associate Blog Editor for the Red Hat Blog.

Read full bio