Skip to main content

Challenges and considerations for architecting machine learning infrastructure

Machine learning model requires unique architectural considerations compared to other IT infrastructure. Consider these challenges and ways to overcome them to succeed.
Machine learning models require unique consideration when building out its architecture.

Photo by Pietro Jeng on Unsplash

Every enterprise aspires to be data-driven. Machine learning is considered a path to the Holy Grail of business insights by leveraging big data to increase profitability and provide a competitive advantage. However, despite the value machine learning models bring to enterprises, they often underperform. While there is a tremendous amount of energy invested in deciding which aspects of the business to improve and which data features to include, the realities of production are overlooked.

What is edge machine learning? ]

To ensure long-term reliability and relevant insights, machine learning models need a data architect or their enterprise architect to consider the complexities of the operational environment.

Machine learning model challenges

There are numerous challenges when considering machine learning architecture.

Machine learning models quickly depreciate. The moment a machine learning model is put into production, new business realities can conflict with the model's assumptions. Business fluctuations are often significant enough that data considered critical for the model during creation is no longer relevant by the time it's fully implemented in production. The same holds true for other aspects, such as training. Essential data features or analysis metrics derived from raw data may be miscalibrated, irrelevant, or missing.

Models lose their relevance from day one.

Assumptions concerning the production infrastructure capabilities can also limit accuracy. These assumptions include imposed limitations on the sample and time window size, leading to bias and pattern misidentification during recurring cycles.

Once in production, additional model prediction inaccuracies can arise due to limitations in gathering and ingesting fresh data from different sources in real time. Models are typically developed by data scientists using synchronous historical data that is available offline. When the model is live, suddenly everything is asynchronous. Certain data or calculations are difficult to source or simply cannot be found. Additionally, a specific calculation that was assigned a high correlation by data scientists can require too many compute resources, preventing the system from meeting service level agreements (SLAs).

[ Learn how to accelerate machine learning operations (MLOps) with Red Hat OpenShift. ]

Models may be unable to read unstructured data in web pages, graphics, posts, comments, videos, and audio clips. This can limit the wisdom of the insights since unstructured data often includes valuable information. For example, it can be critical to assess market sentiment for stock values or to discover the keywords that provide the best value for digital online campaigns. Data models need the ability to add new data features on the fly, even if they are unstructured, without requiring additional steps to normalize the data.

A model reality check

Data scientists, or the data architects supporting their team, must have the ability to assess the production environment while the model is being built. Future problems can be avoided by creating environment parity between model training and production.

A smart Operational Data Store (ODS) powered by in-memory computing can quickly access all data structures from various platforms. Data scientists can then scale up to experience real-life data volumes. An ODS can provide the tools needed for managing hybrid and multi-cloud deployments on both public or private clouds. When architected well, it provides a unified speed layer and API that aggregates data inputs from systems of record, legacy databases, and cloud-based data stores while offloading digital applications from the data sources. Together with in-memory speed and high concurrency support, data scientists can enjoy high performance with 24X7 response times.

The cost of this kind of architecture depends significantly on performance requirements. Cost will minimize or balloon according to these needs. Business rules defined to meet SLAs can automatically move the data from the fastest but most expensive RAM layer to SSD and then data store.

An ODS can function as a Feature as a Service (FaaS) platform by managing interdependent and ad hoc aggregations external to machine learning models. By keeping data calculations in small blocks of sandboxed code, changes can be implemented more quickly. Features can also be calibrated independently. Microservice components can subscribe to any number of event templates. The templates trigger persistent aggregation updates based on real-time data and the weighting of multiple super aggregations from multiple stores and rules received from dependent microservice features downstream.

Try OpenShift Data Science in our Developer sandbox or in your own cluster. ]

The model's shelf life is extended by enabling it to be trained faster using dynamic scaling production fresh data and by reducing its calculation scope by parsing raw data into higher-level complex metrics. Features delivered to the model are more accurate because (1) scalable ingestion provides larger arrays of fresh data faster, and (2) fresh data can be augmented by context, including updatable aggregations of extra-long tail historical data.

Updating models to keep them relevant is one of the most critical factors when selecting which infrastructure to use when putting models into production. In-memory computing technology enables business logic to be run in the same memory space as data in a distributed manner. Doing so delivers the extreme performance required for operationalizing models and retaining them at the necessary cadence.

Wrap up

With machine learning popularity on the rise, data architects must consider both the opportunities and the challenges of their implementation. Models quickly lose accuracy, can balloon in cost, and can introduce inaccuracies through their ongoing ingestion of new data. These challenges can, however, be overcome. A combination of the right architectural choices, like a high-performance in-memory data fabric, and asking the right questions can lead to that success. Machine learning models require careful consideration. Teams that leverage existing IT and enterprise architects in the organization alongside new data architects are poised for success.

Topics:   Artificial intelligence  
Author’s photo

Galen Silvestri

I'm Galen Silvestri, Senior Solutions Engineer for Operationalized Data at GigaSpaces Technologies. More about me

Navigate the shifting technology landscape. Read An architect's guide to multicloud infrastructure.


Privacy Statement