Jump to section

Why run Apache Kafka on Kubernetes?

Copy URL

Deploying Apache Kafka on a container orchestration platform like Kubernetes allows event-driven applications to be automated, scaled, and deployed anywhere. In short, Kubernetes magnifies the inherent flexibility of apps built on Apache Kafka.

Enterprise IT is increasingly adopting microservices and cloud-native development, resulting in distributed systems populated by event-driven applications (EDA). In this dynamic development environment, many digital leaders are utilizing Apache Kafka and combining it with Kubernetes.

Apache Kafka enables users to view and analyze a business in real time, and react quickly to continuously changing market situations. In addition, Apache Kafka is an excellent option for establishing and maintaining real-time connectivity with internal stakeholders and external partners, suppliers and customers.

Kafka Streams—a capability within Apache Kafka that can be added to any application—enables simple and powerful stream processing of Kafka events. This processing and analysis of monumental quantities of data, on the fly, continuously and concurrently, is where Apache Kafka is truly differentiated from other messaging alternatives. Apache Kafka permits users to aggregate, transform, enrich, and organize events for in-line, real-time analysis, rather than waiting for big data machinery to crunch the numbers. This makes Apache Kafka vital for any application requiring immediate responses to real-time data.

Apache Kafka is an ideal foundation for cloud-native development. Cloud-native applications are event driven, and Apache Kafka is the optimal backbone to manage events. Distributed streaming, real time processing, high scalability—all these core event-driven capabilities are enabled by Apache Kafka.

Serverless architecture, the next step after cloud native is also event based and is enabled by Apache Kafka. Developers can rely on Apache Kafka on Kubernetes to provide scalable serverless notifications, inter-process communications, and visibility of serverless functions.

Apache Kafka is frequently deployed on the Kubernetes container management system, which is used to automate deployment, scaling, and operation of containers across clusters of hosts. Apache Kafka on Kubernetes goes hand-in-hand with cloud-native development, the next generation of application development. Cloud-native applications are independent, loosely coupled, and distributed services that deliver high scalability via the cloud. In the same way, the event-driven applications built on Kafka are loosely coupled and designed to scale across a distributed hybrid cloud environment.

A key benefit for operations teams of running Apache Kafka on Kubernetes is infrastructure abstraction: it can be configured once and run everywhere. Operations teams in the modern age typically manage diverse arrays of on premises and cloud resources, and Kubernetes allows them to treat these assets as pools of compute resources to which they can allocate their software resources, including Apache Kafka. Furthermore, this same Kubernetes layer allows a single environment for managing all of their Apache Kafka instances.

The inherent scalability of Kubernetes is a natural complement to Apache Kafka. Kubernetes allows applications to scale resources up and down with a simple command, or scale automatically based on usage, to make the most economical use of computing, networking, and storage resources. Kubernetes also offers Apache Kafka the portability to span across on-premises and public, private, or hybrid clouds, and use different operating systems

Operating Apache Kafka manually is a complex endeavor that requires excessive configuration of many components. Running Apache Kafka on bare metal (or virtual machines, for that matter) is complicated. Deploying, monitoring, updating, and rolling back the nodes is extremely complicated and difficult.

Solving this complexity is where the Strimzi open source project enters the picture. Strimzi uses operators to deploy Apache Kafka configurations smoothly and seamlessly. Operators, the state of the art facility for deploying and managing applications on Kubernetes, provide development flexibility because they abstract at an infrastructure level, allowing developers to deploy applications without much information about the infrastructure. The developer doesn't need to know the technicalities—such as how many machines or what type of hardware—because operators automatically provision the infrastructure and manage all the details.

Strimzi offers the benefits of Infrastructure as Code (IaC), in that the developer can easily write a code-like instruction manual to define the infrastructure, and Strimzi will execute these instructions perfectly. Strimzi can even simplify the deployment of Apache Kafka in high availability modes, which is, again, otherwise difficult.

The operator in Strimzi supports many security concerns for Apache Kafka, which is another important reason to run Strimzi. Strimzi also automates security for Apache Kafka on Kubernetes, with single sign on, encryption, and authentication, so the developer does not have to spend time implementing basic security features

Streams for Apache Kafka, part of Red Hat Integration, is a Red Hat enterprise distribution of Apache Kafka and the Strimzi project. Much of the extra value that streams for Apache Kafka brings to Apache Kafka is focused on the use of Apache Kafka on Kubernetes, or Red Hat OpenShift, which is the Red Hat distribution of Kubernetes.

Streams for Apache Kafka on OpenShift delivers Apache Kafka on Kubernetes to enable enterprise-grade, event-driven architectures that support distributed data streams and stream-processing microservices-based applications. Streams for Apache Kafka is particularly well-suited for high-scale, high-throughput scenarios because the inherent partitioning in Apache Kafka helps address scalability requirements.

Get to know Red Hat OpenShift Service on AWS (ROSA)

Keep reading

Article

What is integration?

Need to know what integration is? Learn what it is, how to incorporate it, and why it’s a lot better with open source.

Article

What is Apache Kafka?

Apache Kafka is a distributed data streaming platform that can publish, subscribe to, store, and process streams of records in real time.

Article

What is an API?

API stands for application programming interface—a set of definitions and protocols to build and integrate application software.

More about integration

Products

A comprehensive set of integration and runtimes technologies engineered to help build, deploy, and operate applications with security in mind and at scale across the hybrid cloud.

Hosted and managed platform, application, and data services that streamline the hybrid cloud experience, reducing the operational cost and complexity of delivering cloud-native applications.

A set of products, tools, and components for developing and maintaining cloud-native applications. Includes Red Hat AMQ, Red Hat Data Grid, Red Hat JBoss® Enterprise Application Platform, Red Hat JBoss Web Server, a Red Hat build of OpenJDK, a Red Hat build of Quarkus, a set of cloud-native runtimes, Migration Toolkit for Applications, single sign-on, and a launcher service.

A comprehensive set of integration and messaging technologies to connect applications and data across hybrid infrastructures. Includes Red Hat 3scale API Management, Red Hat AMQ, Red Hat Runtimes, change data capture, and a service registry.

Resources

E-book

Create an agile infrastructure—and enable an adaptive organization

Training

Free training course

Red Hat Agile Integration Technical Overview