In the era of gen AI and rapid machine learning (ML) adoption, enterprise AI is no longer just a research experiment—it’s a core business driver. But as organizations rush to operationalize their AI initiatives, they’re hitting a significant roadblock: deployment and management at scale.
To help bridge the gap between AI innovation and IT operations, Red Hat Ansible Certified Content Collection for Google Cloud provides native support for Google Cloud’s Vertex AI platform. This release enables a shift in how operations and data science teams manage the lifecycle of their AI services, bringing the simplicity of Ansible Automation Platform to the complex world of machine learning.
Currently, many operations teams and data scientists rely on a patchwork of disparate scripts, manual user interface (UI) clicks, or fragmented processes to manage their AI infrastructure. While this might work for a single prototype, it quickly becomes unmanageable at enterprise scale.
This friction creates operational inefficiencies, security compliance gaps, and inconsistent deployments. It’s a significant drag on innovation as models get stuck in the lab, unable to be reliably promoted to production. The new Vertex AI modules address this bottleneck directly by enabling teams to manage the entire AI lifecycle—from dataset definition to model deployment—using declarative, version-controlled automation.
Key capabilities for Vertex AI on Google Cloud
By automating the Vertex AI platform with Ansible Automation Platform, organizations can implement workflows for deploying, configuring, and managing Vertex AI. The new capabilities will help enable reproducibility, auditability, and a unified automation platform across all Google Cloud services.
The new capabilities include:
- Dataset management (gcp_vertexai_dataset): Define a Vertex AI Dataset as code and manage its lifecycle efficiently.
- Pipeline automation: Trigger and monitor a Vertex AI Training Pipeline automatically as part of an automated CI/CD pipeline.
- Model governance (gcp_vertexai_model): A trained model can be automatically versioned and registered in the Vertex AI Model Registry, establishing a single source of truth for all models.
- Declarative deployment (gcp_vertexai_endpoint): A registered model can be deployed, with support for traffic splitting, enabling safe and controlled releases and consistent provisioning and management to development, staging, and production environments.
- Feature Store management (gcp_vertexai_feature_store): Automate the provisioning of Vertex AI Feature Stores to systematically serve, share, and reuse ML features across your organization.
- Vector search and indexing (gcp_vertexai_index): Deploy and manage Vertex AI indexes to power highly scalable similarity search and retrieval-augmented generation (RAG) applications.
Event-driven automation: Enhancing AIOps and MLOps
Combining these new AI infrastructure modules with Event-Driven Ansible, part of Ansible Automation Platform, increases responsiveness, empowering organizations to handle both MLOps and AIOps use cases efficiently.
MLOps (managing the AI): ML models aren’t static assets; their performance degrades over time as real-world data changes. Instead of waiting for a human operator to notice an issue, Event-Driven Ansible can listen for telemetry and alerts from Vertex AI Model Monitoring. If a system detects "model drift" (a drop in predictive accuracy), Event-Driven Ansible can automatically:
- Roll back the Vertex AI endpoint to a previous, stable model version.
- Kick off a new Vertex AI Training Pipeline using the latest dataset.
- Open an IT Service Management System (ITSM) ticket (e.g., ServiceNow) with detailed diagnostic context.
AIOps (AI managing the infrastructure): Conversely, as organizations increasingly rely on AI-driven observability tools to monitor their sprawling IT environments, Event-Driven Ansible serves as the vital "action engine" for those insights. When an AIOps platform detects an anomaly—such as predicting an impending database failure or identifying anomalous network traffic—Event-Driven Ansible can ingest that intelligent alert and instantly trigger a remediation playbook. As part of the remediation workflow, Ansible Automation Platform can query AI agents running on the Vertex platform and use the response to guide remediation and enrich the ITSM ticket relating to the alert. Automated remediation includes scaling resources, isolating compromised networks, or clearing disk space long before a human operator even gets paged.
Together, the combination of AIOps insights, Vertex AI modules, and Event-Driven Ansible shift enterprise operations from reactive troubleshooting to a proactive, self-healing ecosystem.
Example: Automating a Vertex AI Deployment Pipeline
Let's look at what this means in practice. Instead of clicking through the Google Cloud Console or maintaining bash scripts, you can define your ML deployment declaratively. Here is an example Ansible Automation Platform workflow demonstrating how to create a dataset, register a trained model, and deploy a serving endpoint:
- name: Deploy Vertex AI ML Pipeline
hosts: localhost
gather_facts: false
vars:
gcp_project: "my-gcp-project"
gcp_region: "us-central1"
tasks:
- name: Create a Vertex AI Image Dataset
google.cloud.gcp_vertexai_dataset:
display_name: "production_image_dataset"
metadata_schema_uri: "gs://google-cloud-aiplatform/schema/dataset/metadata/image_1.0.0.yaml"
region: "{{ gcp_region }}"
project: "{{ gcp_project }}"
auth_kind: "application"
state: present
- name: Register a trained Vertex AI Model
google.cloud.gcp_vertexai_model:
display_name: "my_predictive_model_v1"
artifact_uri: "gs://my-ml-artifacts-bucket/models/v1/"
container_spec:
image_uri: "us-docker.pkg.dev/vertex-ai/prediction/tf2-cpu.2-8:latest"
region: "{{ gcp_region }}"
project: "{{ gcp_project }}"
auth_kind: "application"
state: present
- name: Provision a Vertex AI Serving Endpoint
google.cloud.gcp_vertexai_endpoint:
display_name: "production_serving_endpoint"
region: "{{ gcp_region }}"
project: "{{ gcp_project }}"
auth_kind: "application"
state: presentWith just a few lines of YAML, we've gone from data configuration to a live AI endpoint. This playbook can be committed to source control management systems like GitHub, GitLab, and others, reviewed by peers, and executed via Ansible Automation Platform, providing complete traceability, repeatability, and eliminating "it works on my machine" deployment failures.
By automating Vertex AI models, you can streamline deploying, updating, and auditing of AI agents and foundation models, bringing consistent and repeatable AI workflows across your entire environment. The launch of these new capabilities is a testament to our commitment to cloud automation, bringing the benefits of automation to cloud-native AI infrastructure management.
For more technical details on the Vertex AI platform, refer to the official Google Cloud documentation and the Red Hat Ecosystem Catalog.
Resource
5 steps to automate your business
About the author
Matthew Packer is a Principal Product Marketing Manager for Ansible Automation Platform and is responsible for cloud automation. Prior to joining Red Hat, he worked in product marketing specializing in retail payment technology at Vontier and product management at Cisco in cloud-based networking. Matthew also worked as a consultant at Honeywell in the manufacturing and utilities industries with a focus on the Internet of Things (IoT) and predictive analytics space.
More like this
Red Hat OpenShift sandboxed containers 1.12 and Red Hat build of Trustee 1.1 bring confidential computing to bare metal and AI workloads
AI for scientific research: Building the research platform that science needs with Red Hat AI
Technically Speaking | Build a production-ready AI toolbox
Technically Speaking | Platform engineering for AI agents
Browse by channel
Automation
The latest on IT automation for tech, teams, and environments
Artificial intelligence
Updates on the platforms that free customers to run AI workloads anywhere
Open hybrid cloud
Explore how we build a more flexible future with hybrid cloud
Security
The latest on how we reduce risks across environments and technologies
Edge computing
Updates on the platforms that simplify operations at the edge
Infrastructure
The latest on the world’s leading enterprise Linux platform
Applications
Inside our solutions to the toughest application challenges
Virtualization
The future of enterprise virtualization for your workloads on-premise or across clouds