There are many definitions of big data, but they all have common themes around storing, processing, and accessing enterprise data so decisions can be made better and faster to help increase revenue and customer satisfaction. Big data is a unique combination of high volume, high velocity, and high variety information that needs to be economically processed for business insight and decision making, and approaches to managing big data can vary.
Our approach is focused on reducing the gap between big data and actionable information by cost-effectively making all data available for analytics through a platform for capturing, processing, and integrating big data. The best way enterprises can turn big data into actionable information is by making all enterprise data available for timely and effective analysis. Sounds easy, right? This is actually a huge challenge for IT departments because there are several types of data processed in big data environments by more than one system. Having a consistent infrastructure for supporting these varied data operations is one way to simplify the big data environment.
In today’s typical enterprise there are three types of big data: business-, machine- and human-generated data. Business-generated big data primarily includes business data coming from online transaction processing and data warehouse systems, as well as data from application log files, such as clickstream and web traffic data. Machine-generated big data includes events and messages from devices and sensors, such as routers, RFID, and mobile phones. Human-generated big data includes social media traffic from individuals using Twitter, Facebook, and other applications.
Given the variety of the three types of big data, most enterprises start by extracting, normalizing, and filtering the data so it can be queued and distributed to the right application for analysis at the right time, yet not in a standalone, siloed approach. Once the data is loaded into the appropriate analytics platform, enterprises can gain business insight and results in a timely fashion.
Our enterprise customers generally incorporate a combination of the following categories of analytics systems:
- Massively Parallel Processing (MPP) systems with in-memory databases and/or key/value pair columnar databases that can process data in real-time as it streams in;
- Hadoop clusters for batch processing of sentiment analysis,data visualization or predictive analysis based on large historical data sets; and
- traditional business intelligence, data warehouse and OLAP systems.
Most big data customers we work with have some combination of the three analytical systems above operating on some combination of the three big data types. The last important step is to present all the data exiting these various analytics systems and deliver it up to business analysts and sometimes consumer-facing applications in a common format. Data virtualization plays a critical role in broadening the usage of big data for analytical and operational uses by abstracting different data sources through a single data access layer, which then delivers integrated information as data services to users and applications in real-time or near real-time. Only after the data is integrated with existing enterprise information systems can the business get a holistic view across all the data and systems to gain comprehensive business insight and then take action to increase revenue and customer satisfaction.
Sounds simple, but it is not easy to accomplish this outcome cost-effectively; and that is where Red Hat can help. By using a combination of the Red Hat technologies below, our customers can achieve the goal of cost-effectively translating their big data into actionable information. By providing a technology stack that can support the many phases of big data transformation Red Hat offers a consistent foundation and the necessary tools to tackle even the most complex big data projects.
Red Hat Storage
- Red Hat Storage Server, an open software-defined storage platform that scales to petabytes and serves data to both POSIX-compliant and HDFS-based analytical systems found in a typical big data work flow.
Red Hat Middleware
- JBoss A-MQ to handle real-time and high-volume streaming data, JBoss BRMS to handle complex events processing and filtering.
- JBoss Data Grid to provide real-time, in-memory distributed caching to feed systems for high velocity data processing.
- JBoss Data Virtualization to integrate data from multiple sources, including Hadoop clusters through hive connectivity, relational databases, NoSQL, and files into logical, business-friendly data models that can be easily consumed through standard mechanism to gain actionable information.
Red Hat Enterprise Linux
- Red Hat Enterprise Linux provides the underlying infrastructure platform with the memory management, high-performance processing, and scalability required by the big data lifecycle in the data center. Red Hat Enterprise Linux brings with it a broad ecosystem of trusted partners and ISVs that can help with implementing big data projects.
Red Hat Enterprise Linux OpenStack Platform
- As both the big data market and technology matures, we expect enterprises to inevitably evolve to lower-cost, more flexible cloud-based infrastructures that provide elastic demand capabilities. Red Hat’s portfolio of open hybrid cloud technologies offer cost-effective, open source solutions to enterprises. Red Hat Enterprise Linux OpenStack Platform, combined with emerging open source projects – such as Savanna –can provide big data cloud infrastructure.
All of the Red Hat offerings above are hardened for the enterprise, 100 percent open source, and backed by a robust community of innovative partners. This enables enterprises to use commodity volume economics to cost-effectively turn their big data into actionable information.
We are at the Strata Conference + Hadoop World event this week so stop by to learn more about Red Hat’s big data solutions and see a demo of Hadoop integration with existing data with Red Hat JBoss Data Virtualization, Red Hat Storage, and Red Hat Enterprise Linux.
The OpenStack® Word Mark and OpenStack Logo are either registered trademarks / service marks or trademarks / service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation's permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation or the OpenStack community
About the author
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. Red Hat helps customers integrate new and existing IT applications, develop cloud-native applications, standardize on our industry-leading operating system, and automate, secure, and manage complex environments.