AIOps and MLOps made simple: Automating Vertex AI with Red Hat Ansible Automation Platform

26 marzo 2026Matthew Packer4 minuti (tempo di lettura)

In the era of gen AI and rapid machine learning (ML) adoption, enterprise AI is no longer just a research experiment—it’s a core business driver. But as organizations rush to operationalize their AI initiatives, they’re hitting a significant roadblock: deployment and management at scale.

To help bridge the gap between AI innovation and IT operations, Red Hat Ansible Certified Content Collection for Google Cloud now provides comprehensive support for Google Cloud’s Vertex AI platform. This release enables a shift in how operations and data science teams manage the lifecycle of their AI services, bringing the simplicity of Ansible Automation Platform to the complex world of machine learning.

Currently, many operations teams and data scientists rely on a patchwork of disparate scripts, manual user interface (UI) clicks, or fragmented processes to manage their AI infrastructure. While this might work for a single prototype, it quickly becomes unmanageable at enterprise scale.

This friction creates operational inefficiencies, security compliance gaps, and inconsistent deployments. It’s a significant drag on innovation as models get stuck in the lab, unable to be reliably promoted to production. The new Vertex AI modules address this bottleneck directly by enabling teams to manage the entire AI lifecycle—from dataset definition to model deployment to RAG pipeline configuration—using declarative, version-controlled automation.

Key capabilities for Vertex AI on Google Cloud

By automating the Vertex AI platform with Ansible Automation Platform, organizations can implement workflows for deploying, configuring, and managing Vertex AI. The new capabilities will help enable reproducibility, auditability, and a unified automation platform across all Google Cloud services.

The new capabilities include:

Dataset management (gcp_vertexai_dataset): Define a Vertex AI Dataset as code and manage its lifecycle efficiently.
Model Garden deployment (gcp_vertexai_endpoint_with_model_garden_deployment): Deploy foundation models from Google's Model Garden—including Gemini, PaLI, and Hugging Face models—directly to endpoints with a single task. Supports EULA acceptance, custom machine specs, and automated endpoint creation.
Declarative deployment (gcp_vertexai_endpoint): A registered model can be deployed, with support for traffic splitting, enabling safe and controlled releases and consistent provisioning and management to development, staging, and production environments.
Feature Store management (gcp_vertexai_feature_store): Automate the provisioning of Vertex AI Feature Stores to systematically serve, share, and reuse ML features across your organization.
Vector search and indexing (gcp_vertexai_index): Deploy and manage Vertex AI indexes to power highly scalable similarity search and retrieval-augmented generation (RAG) applications.
Reasoning engine (gcp_vertexai_reasoning_engine): Deploy AI agents and reasoning engines on Vertex AI, including support for source-based deployments with inline Python code, custom requirements, and versioned entrypoints—enabling agentic AI workflows managed as code.

Event-driven automation: Enhancing AIOps and MLOps

Combining these new AI infrastructure modules with Event-Driven Ansible, part of Ansible Automation Platform, increases responsiveness, empowering organizations to handle both MLOps and AIOps use cases efficiently.

MLOps (managing the AI): ML models aren’t static assets; their performance degrades over time as real-world data changes. Instead of waiting for a human operator to notice an issue, Event-Driven Ansible can listen for telemetry and alerts from Vertex AI Model Monitoring. If a system detects "model drift" (a drop in predictive accuracy), Event-Driven Ansible can automatically:

Roll back the Vertex AI endpoint to a previous, stable model version.
Open an IT Service Management System (ITSM) ticket (e.g., ServiceNow) with detailed diagnostic context.

AIOps (AI managing the infrastructure): Conversely, as organizations increasingly rely on AI-driven observability tools to monitor their sprawling IT environments, Event-Driven Ansible serves as the vital "action engine" for those insights. When an AIOps platform detects an anomaly—such as predicting an impending database failure or identifying anomalous network traffic—Event-Driven Ansible can ingest that intelligent alert and instantly trigger a remediation playbook. As part of the remediation workflow, Ansible Automation Platform can query AI agents running on the Vertex platform and use the response to guide remediation and enrich the ITSM ticket relating to the alert. Automated remediation includes scaling resources, isolating compromised networks, or clearing disk space long before a human operator even gets paged.

Together, the combination of AIOps insights, Vertex AI modules, and Event-Driven Ansible shift enterprise operations from reactive troubleshooting to a proactive, self-healing ecosystem.

Example: Automating a Vertex AI Deployment Pipeline

Let's look at what this means in practice. Instead of clicking through the Google Cloud Console or maintaining bash scripts, you can define your ML deployment declaratively. Here is an example Ansible Automation Platform workflow demonstrating how to create a dataset, register a trained model, and deploy a serving endpoint:

name: Deploy Vertex AI Infrastructure
  hosts: localhost
  gather_facts: false
  vars:
    gcp_project: "my-gcp-project"
    gcp_region: "us-central1"
  tasks:
    - name: Create a Vertex AI Image Dataset
      google.cloud.gcp_vertexai_dataset:
        display_name: "production_image_dataset"
        metadata_schema_uri: >-
          gs://google-cloud-aiplatform/schema/dataset/
          metadata/image_1.0.0.yaml
        region: "{{ gcp_region }}"
        project: "{{ gcp_project }}"
        auth_kind: "application"
        state: present
    - name: Deploy a Gemini model from Model Garden
      google.cloud.gcp_vertexai_endpoint_with_model_garden_deployment:
        publisher_model_name: >-
          publishers/google/models/gemini-pro
        endpoint_config:
          endpoint_display_name: "production-gemini-endpoint"
        model_config:
          model_display_name: "gemini-pro-production"
          accept_eula: true
        location: "{{ gcp_region }}"
        project: "{{ gcp_project }}"
        auth_kind: "application"
        state: present
    - name: Provision the RAG Engine
      google.cloud.gcp_vertexai_rag_engine_config:
        rag_managed_config: scaled
        region: "{{ gcp_region }}"
        project: "{{ gcp_project }}"
        auth_kind: "application"
        state: present
    - name: Create a TensorBoard for experiment tracking
      google.cloud.gcp_vertexai_tensorboard:
        display_name: "production-tensorboard"
        description: "Experiment tracking for production models"
        labels:
          environment: production
          team: ml-platform
        region: "{{ gcp_region }}"
        project: "{{ gcp_project }}"
        auth_kind: "application"
        state: present
    - name: Provision a shared GPU resource pool
      google.cloud.gcp_vertexai_deployment_resource_pool:
        name: "shared-gpu-pool"
        dedicated_resources:
          min_replica_count: 1
          max_replica_count: 3
          machine_spec:
            machine_type: n1-standard-4
            accelerator_type: NVIDIA_TESLA_T4
            accelerator_count: 1
        region: "{{ gcp_region }}"
        project: "{{ gcp_project }}"
        auth_kind: "application"
        state: present

With just a few lines of YAML, we've gone from data configuration to a live AI endpoint. This playbook can be committed to source control management systems like GitHub, GitLab, and others, reviewed by peers, and executed via Ansible Automation Platform, providing complete traceability, repeatability, and eliminating "it works on my machine" deployment failures.

By automating Vertex AI models, you can streamline deploying, updating, and auditing of AI agents and foundation models, bringing consistent and repeatable AI workflows across your entire environment. The launch of these new capabilities is a testament to our commitment to cloud automation, bringing the benefits of automation to cloud-native AI infrastructure management.

For more technical details on the Vertex AI platform, refer to the official Google Cloud documentation and the Red Hat Ecosystem Catalog.

Sull'autore

Matthew Packer

Principal Product Marketing Manager

Matthew Packer is a Principal Product Marketing Manager for Ansible Automation Platform and is responsible for cloud automation. Prior to joining Red Hat, he worked in product marketing specializing in retail payment technology at Vontier and product management at Cisco in cloud-based networking. Matthew also worked as a consultant at Honeywell in the manufacturing and utilities industries with a focus on the Internet of Things (IoT) and predictive analytics space.

Read full bio