Contributed content from Foutse Khomh and Giuliano Antoniol, Professors in the Department of Computer Engineering and Software Engineering at Polytechnique Montréal (Canada), Montreal, Canada.
Red Hat, as an open source community leader, participated with AI/ML researchers at the Software Engineering for Machine Learning Applications (SEMLA) initiative – sharing how organizations can take advantage of modern infrastructure based on open technology such as agile integration, microservices, and containerized applications – to help build and deploy managed, scalable intelligent applications on hybrid clouds.
Organizations eager to adopt AI and machine learning (ML) are up against significant challenges. The practice of bridging the gap between data science and operations, much in the same way that DevOps can for application development. And just as with DevOps, there are architectural, cultural and process considerations associated with creating an agile AI/ML environment. For example, parallel to modern DevOps, open hybrid cloud platforms can allow for faster turnaround of refining models, integrating disparate data sources more quickly, and making it easier to take advantage of ML capabilities and tools from both providers and third parties – who have made their solutions available as containerized services.
AI/ML have the promise to intelligently automate aspects of business and everyday life. Building, tuning and training algorithms that learn, however, doesn’t neatly fit into agile development practices. Data scientists seek precision in their work – devising models to best represent the data for a defined goal. Engineers that are responsible for operations, on the other hand, seek to optimize performance of systems. This leads to a tradeoff in AI/ML – more exacting results can require more data which can demand more processing – which can be in opposition to the type of optimization results that operations engineering teams are wired to do.
Putting a single, standalone model into use isn’t the problem. It’s when organizations look to interconnected machine learning data and application components that drive the final, desired results – access through services, built in-house or a combination of the two. Data scientists are working to design features and train models for applications and systems that were never designed to learn and self-improve.
Many AI/ML methods are data-greedy algorithms. To get enough data, data scientists can turn to third-party data sources to train models. Aside from ongoing data privacy issues, regulated organizations face additional risk when transferring these models to different environments, where that third-party data lives. In a regulated data environment consider the challenge that the data upon which the model depend may often be encrypted to protect it. Learning on encrypted data to predict unencrypted new data is at the forefront of addressing AI/ML in production. Furthermore, when visibility to the data is limited, interpretability can become opaque.
A trained model can make a wrong decision. Testing model behavior requires a window into the inner workings of the model. When you don’t already know the answer, as is the case with learning models – what should be tested? When the model behavior does not depend on the source code but, instead, depends on the training data, what action should QA take? Often testing boils down to the acceptability of an answer. One can get a sense of the model by examining how a model decision was made. However, models are often just matrices of real numbers. And when data is encrypted, how a decision was made can’t be answered – confidential data (by definition) isn’t traceable.
A deeper view of the AI/ML lifecycle is needed as part of the workflow. Model building and deployment have co-dependent constraints that are across teams with different areas of expertise. If data scientists understood the challenges of deployment, performance and application services – they should be better equipped to create enduring models. Likewise, if engineers understood how data is labelled, how it can be tracked, if there are access considerations/limitations in different environments, they should be better able to understand how best to deploy the model.
The architecture of the processing environment for AI/ML affects the success of the model and the model impacts the processing environment needs. And while there is the need to answer a wide of questions to reduce the friction, some certainties do exist.
Beginning with the end in mind means starting with an environment built for flexible integration, open to address the needs of the different domain experts – those that know the data, those that build the models and those that are deploying them. To have a common language they should have a common foundation, a platform that can provide traceability, lineage and visibility as part of workflow across the model pipeline.
Take labeling data for an example. Adding labels (also known as tagging) is the necessary, and often painstaking, first step in ML – the process of manually adding classifiers to each piece of content for model training. Whether that be classifying an image as tumor or a cyst, classifying text as that of cyber trolls, what the emotional sentiment was in the audio, and alike. Content inputs can often come from a multitude of sources for any given project. If the same platform is used domain experts could label the data, which is accessible to the data scientists for training, helping to retain the history of each piece of data – for the operations team charged with deployment. Not only can this give context for deployment needs, but it also means that the experts that know the data label it. As a side benefit, it can also creates an auditable journey.
Technologies which bring the application and its dependencies together as a transferable unit are one approach to create deployment environments that can scale. Microservices that decompose applications to address individual machine learning components over their life cycles, and that can be reused, can provide advantages to address things like model drift and reintroduction. And simultaneous hybrid cloud and multi-cloud environments for training models on data wherever it resides – are emerging as a modern infrastructure approach.
Quality assurance requires an open approach
Introducing quality assurance in AI/ML systems and doing DataOps will require an open approach. Open communication across systems, open to data, and open across departments with different types of expertise. Open, robust, and reliable tools and environments, including open hardware architectures, are necessary to foster the development and deployment of AI/ML applications. And the benefits of open technology can extend even to the underlying hardware required for many AI/ML workloads.
The open source community, that includes AI/ML innovators, Red Hat and other organizations, are working to help shape the future of bringing AI to life in applications, to create an open and available deployment framework for ecosystems. From hardware and API specifications, to curating and sharing data, along with open source machine learning infrastructure and environments. Last but not least, helping to mold existing tools and components into portable, easier to use, visual environments supporting the lifecycle from inception and early experimentation to deployment into the trenches.