Streaming data is the continuous flow of real-time information, and the foundation of the event-driven architecture software model. Modern applications can use streaming data to enable data processing, storage, and analysis.
One way to think about streaming data is as a running log of changes or events that have occurred to a data set—often one changing at an extremely high velocity.
The large, fast-moving data sets that can be sources of streaming data are as varied as financial transactions, Internet of Things (IoT) sensor data, logistics operations, retail orders, or hospital patient monitoring. Like a next generation of messaging, data streaming is suited for situations that demand a real-time responses to events.
One example of streaming data is event data, which forms the foundation of event-driven architecture. Event-driven architecture brings together loosely coupled microservices as part of agile development.
When you think of streaming data, think of real-time applications. Some common use cases include:
- Digital experiences that rely on immediate access to information.
- Microservices applications that support agile software development.
- Streaming scenarios that modernize database-driven applications that were previously driven by batch processing.
- Real-time analytics, especially ones that ingest data from multiple sources.
- Edge computing that brings together data from diverse and disparate devices and systems.
Apps built around messaging, geolocation, stock trades, fraud detection, inventory management, marketing analytics, IT systems monitoring, and industrial IoT data are some popular use cases for data streams.
Apache Kafka is an open-source distributed messaging platform that has become one of the most popular ways to work with large quantities of streaming, real-time data.
Software developers use Kafka to build data pipelines and streaming applications. With Kafka, applications can:
- Publish and subscribe to streams of records.
- Store streams of records.
- Process records as they occur.
Kafka is designed to manage streaming data while being fast, horizontally scalable, and fault-tolerant. Since Kafka minimizes the need for point-to-point integrations for data sharing in certain applications, it can reduce latency to milliseconds. This means data is available to users faster, which can be advantageous in use cases that require real-time data availability, such as IT operations and e-commerce, and many others.
Apache Kafka can handle millions of data points per second, which makes it well suited for big data challenges. In many data processing use cases, such as the IoT and social media, data is increasing exponentially, and may quickly overwhelm an application based on today's data volume.
By definition, data streams must deliver sequenced information in real time. Streaming data applications depend on streams that are consistent and highly available, even during times of high activity. Delivering and/or consuming a data stream that meets these qualities can be challenging.
The amount of raw data in a stream can surge rapidly. Consider the sudden exponential growth of new data created by stock trades during a market selloff, social media posts during a big sporting event, or log activity during a system failure. Data streams must be scalable by design. Even during times of high activity, they need to prioritize proper data sequencing, data consistency, and availability. Streams also must be designed for durability in the event of a partial system failure.
Across a distributed hybrid cloud environment, a streaming data cluster demands special considerations. Typical streaming data brokers are stateful and must be preserved in the event of a restart. Scaling requires careful orchestration to make sure messaging services behave as expected and no records are lost.
Why use a streaming data service?
The challenge of delivering a complex, real-time, highly available streaming data platform can consume significant resources. It often takes expertise and hardware beyond the capabilities of an in-house IT organization.
For these reasons, many streaming data users opt for a managed cloud service, in which infrastructure and system management is offloaded to a service provider. This option helps organizations focus on their core competencies, rather than management and administration of a complex streaming data solution.