Data services (or Data-as-a-Service) are collections of small, independent, and loosely coupled functions that enhance, organize, share, or calculate information collected and saved in data storage volumes. Data services amplify traditional data by improving its resiliency, availability, and validity, as well as adding characteristics to data that it doesn't already have natively—like metadata.
Data services are self-contained units of software functions that give data characteristics it doesn't already have. Data services can make data more available, resilient, and comprehensible, which makes data more useful to users and programs.
Data service functions turn inputs into outputs. The inputs are varied sets of raw data—data that hasn’t been processed for a specific purpose—configured in its native format and saved in physical, virtual, or cloud-based storage volumes. The outputs are usually:
- Organizational: The consolidation, management, batching, and structure of data, usually pulled from structured (databases), semi-structured (data warehouses), or unstructured (data lakes) sources.
- Transferable: The movement of data from their place of origin across a network to an end point, like an application or platform.
- Procedural: The processing of data, usually as part of data modeling, analytics, or artificial intelligence/machine learning (AI/ML) software.
Data at rest
Data saved in storage volumes. Data services abstract raw data from their sources—like customer records from online transactional processing (OLTP) databases, property damage information from data warehouses, and images or videos from data lakes—and apply governance principles, organization, and maintenance that make data useful to applications and accessible by users. Data services are an important part of big data strategies because it can make sense of massive collections of structured, semi-structured, and unstructured data stored all over the place.
Data in motion
Data moving from its storage origin to an application or platform, usually in real-time. Data services can create data pipelines to help data move continuously between multiple endpoints. For example, data services can help organizations shift from batch-oriented data processing to event-driven data processing by operating on data immediately as it is generated. Data services also help ensure data is never actually removed from its origin—allowing multiple endpoints to use the same datapoint at once. This can be used to create scalable, event-driven architectures.
Data in action
Active data grouped into data sets being used by data science, data analytics, and data modeling software. Data services help improve data access to high-performance, intelligent data processing platforms—like AI/ML and deep learning tools. Depending on the data service, data in action could involve collections of small, independent, and loosely coupled services—usually packaged in containers and orchestrated by a Kubernetes platform.
Without data services that help developers and data scientists collaborate as data moves between systems, cloud-native application development is impossible. Multiple code commits that use the same data can extend build times, but a data service like Red Hat® OpenShift® Data Foundation can reduce time dependencies on concurrent builds.
The actual collection and retention of raw digital information—the bits and bytes behind applications, network protocols, documents, media, address books, user preferences, and more. When you save a document and select a location, you are going through the process of data storage. A user’s view into data storage is usually at the infrastructure level, and is rarely connected between storage volumes. For example, there’s usually not a native way to view every file, block, or object saved across a workstation, cloud storage provider, and external hard drive—making the act of exploring data storage very manual and monolithic.
Software that uses data saved in traditional data storage volumes as inputs to create specific outputs; or software that amplifies traditional data by improving its resiliency, availability, and validity. Users typically interact with data services as part of an application, making the process very flexible and customizable. For example, the data service provided by Red Hat OpenShift Data Foundation abstracts storage infrastructure so data can be stored in many different places—but act as a single persistent repository.
The Massachusetts Open Cloud (MOC) uses data services. The MOC is a nonprofit initiative of universities, government organizations, and businesses. It was formed to develop a common, cloud-based infrastructure for businesses, governments, and nonprofits to analyze big data. MOC used Red Hat Ceph Storage—a software-defined storage service—to organize and share large amounts of data with multiple entities running custom data analytics platforms.
With no prior experience with OpenShift Container Storage, our team was able to set up 2 distinct OpenShift clusters and conduct full Db2 Warehouse Performance validation in less than 2 weeks.
Because our data services not only work well with every data storage provider, but our data services are built to compliment cloud-native application development.
So use any datacenter or cloud you want, and start implementing all that data into your ever-evolving cloud-native apps. With our data services, your enterprise’s old data can be enhanced and streamed right into your cloud-native apps to reveal important information that may solve tomorrow’s biggest challenges.
Check out how Red Hat Ceph Storage performed as part of Evaluator Group’s 10 billion object test.