What are data services?
Data services (sometimes described as Data-as-a-Service) generally refer to small, independent, and loosely coupled functions that enhance, organize, share, or calculate information collected and saved in data storage volumes. Data services amplify traditional data by improving its resiliency, availability, and validity, as well as adding characteristics to data that it doesn't already have natively—like metadata. Data service architectures can involve multiple kinds of data and application services working together to achieve a goal, such as in Intelligent Data-as-a-Service (iDaaS) architectures.
How do data services work?
Data services are self-contained units of software functions that give data characteristics it doesn't already have. Data services can make data more available, resilient, and comprehensible, which makes data more useful to users and programs.
Data service functions turn inputs into outputs. The inputs are varied sets of raw data—data that hasn’t been processed for a specific purpose—configured in its native format and saved in physical, virtual, or cloud-based storage volumes. The outputs are usually:
- Organizational: The consolidation, management, batching, and structure of data, usually pulled from structured (databases), semi-structured (data warehouses), or unstructured (data lakes) sources.
- Transferable: The movement of data from their place of origin across a network to an end point, like an application or platform.
- Procedural: The processing of data, usually as part of data modeling, analytics, or artificial intelligence/machine learning (AI/ML) software.
What are data services used for?
Managing stored data
Data services can help manage data at rest, in other words data saved in storage volumes. Data services abstract raw data from their sources—like customer records from online transactional processing (OLTP) databases, property damage information from data warehouses, and images or videos from data lakes—and apply governance principles, organization, and maintenance that make data useful to applications and accessible by users. Data services are an important part of big data strategies because they can make sense of massive collections of structured, semi-structured, and unstructured data stored all over the place.
Data services can be used for data in motion, as it moves from its storage origin to an application or platform, usually in real-time. Data services can create data pipelines to help data move continuously between multiple endpoints. For example, data services can help organizations shift from batch-oriented data processing to event-driven data processing by operating on data immediately as it is generated. Data services also help ensure data is never actually removed from its origin—allowing multiple endpoints to use the same datapoint at once. This can be used to create scalable, event-driven architectures.
Data services can help put active data to use in data science, data analytics, and data modeling software. Data services help improve data access to high-performance, intelligent data processing platforms—like AI/ML and deep learning tools. Depending on the data service, data in action could involve collections of small, independent, and loosely coupled services—usually packaged in containers and orchestrated by a Kubernetes platform.
Traditional storage vs. data services
The actual collection and retention of raw digital information—the bits and bytes behind applications, network protocols, documents, media, address books, user preferences, and more. When you save a document and select a location, you are going through the process of data storage. A user’s view into data storage is usually at the infrastructure level, and is rarely connected between storage volumes. For example, there’s usually not a native way to view every file, block, or object saved across a workstation, cloud storage provider, and external hard drive—making the act of exploring data storage very manual and monolithic.
Software that uses data saved in traditional data storage volumes as inputs to create specific outputs; or software that amplifies traditional data by improving its resiliency, availability, and validity. Users typically interact with data services as part of an application, making the process very flexible and customizable. For example, the data service provided by Red Hat® OpenShift® Data Foundation abstracts storage infrastructure so data can be stored in many different places—but act as a single persistent repository.
Why Red Hat?
Red Hat’s solutions help you support every aspect of cloud-native application development, including data services, allowing you to continuously deliver new features to your customers.
Red Hat Cloud Services offerings include platforms like Red Hat OpenShift Data Science, which provides a fully supported environment to rapidly develop, train, and test machine learning (ML) models in the public cloud before deploying in production.