Subscribe to the feed

In this article, you’ll learn about the Performance Co-Pilot (PCP) tool and how we take advantage of it to implement system and application monitoring for Red Hat Ansible Automation Platform.

What is Performance Co-Pilot (PCP)

PCP is an open source performance monitoring and analysis framework developed by Red Hat. It provides a suite of tools, libraries and services to monitor, retrieve and analyze performance metrics from different systems, services and applications. PCP is designed for scalability, enabling it to monitor anything from a single server to a large, distributed network of machines in real time.

Key features of PCP:

  1. Scalability: PCP can be used to monitor both individual systems and distributed environments
  2. Multisource data collection: It collects data from multiple sources, including the operating system (OS), databases, network interfaces and custom applications
  3. Extensibility: New metrics can be added by developing custom agents or extensions
  4. Storage and retrieval: PCP can store performance data for historical analysis and supports real-time data retrieval
  5. Real-time monitoring: It provides real-time metrics, enabling live performance analysis
  6. Graphical and command-line interfaces: PCP includes both graphical (e.g., pmchart) and command-line tools (e.g., pminfo, pmval and pmlogsummary) for monitoring and performance data analysis

Typical components:

  • Performance Metrics Collector Daemon (PMCD): The central daemon that gathers metrics from agents
  • Performance Metrics Name Space (PMNS): A hierarchical namespace that organizes the performance metrics
  • Performance Metrics Inference Engine (PMIE): A tool for generating alerts or actions based on real-time metric thresholds
  • PMLogger: For logging performance metrics for later analysis
  • PMProxy: Acts as a proxy protocol, enabling PCPto monitor clients to connect to one or more PMCD instances via PMProxy

Use cases:

  • System performance analysis: PCP can monitor CPU, memory, disk I/O, network usage and other system metrics
  • Application monitoring: PCP can monitor specific applications or services to understand their resource consumption and performance trends
  • Historical data analysis: It can store performance data over time for historical trend analysis or forensic analysis after system failures

Why monitor Ansible Automation Platform using PCP ?

Monitoring Ansible Automation Platform using PCP is important for several reasons:

  1. Performance insights: PCP provides detailed metrics and insights into the performance of your Ansible Automation Platform. This helps in identifying bottlenecks and optimizing resource usage
  2. Proactive issue detection: By continuously monitoring performance metrics, you can detect potential issues before they escalate into significant problems, allowing for proactive troubleshooting
  3. Resource management: Understanding resource utilization (CPU, memory, disk I/O) helps in effective capacity planning and ensures that your automation environment runs smoothly without resource contention
  4. Scalability: As your automation needs to grow, monitoring helps assess when and how to scale your Ansible Automation Platform infrastructure, ensuring it can handle increased workloads without degradation in performance
  5. Compliance and auditing: Monitoring tools can help maintain compliance with internal and external regulations by providing a clear audit trail of automation activities and resource usage
  6. Integration with other tools: PCP can integrate with other monitoring and alerting systems, providing a comprehensive view of your infrastructure and enabling better incident response
  7. User experience: Ensuring that your automation tasks run efficiently improves the overall user experience for teams relying on Ansible Automation Platform for deployment and configuration management
  8. Historical data analysis: PCP retains historical performance data, allowing you to analyze trends over time, which is essential for making informed decisions about future infrastructure changes or optimizations

In summary, using PCP to monitor Ansible Automation Platform enhances performance, reliability and efficiency, so that automation efforts contribute positively to organizational goals.

Set up monitoring on Ansible Automation Platform using PCP

Currently, monitoring setup in Ansible Automation Platform is supported for both traditional and containerized installations on virtual machines (VMs). To enable monitoring, you must assign the setup_monitoring boolean to True in the set-up inventory file under the [all:vars] section. For example:

[all:vars]
setup_monitoring = True

When you run the installer, it will execute the monitoring role to configure the PCP on the Ansible Automation Platform cluster. This role installs and activates the necessary services, including pcp, pmcd and pmproxy. On the traditional RPM-based deployment, PCP is installed via DNF and run via systemd. On the containerized install, it is run in a container alongside all other Ansible Automation Platform components. Additionally, the installer sets up Performance Metric Domain Agents (PMDAs)—which are plug-ins that run as daemons for pmcd—to monitor key components such as nginx, redis, postgres and openmetrics on the Ansible Automation Platform hosts.

Furthermore, on the traditional installation, the installer designates the gateway node as the central hub for collecting PCP metrics from all nodes in the Ansible Automation Platform cluster to effectively archive metrics.

PCP uses port 44322 to expose metrics. Please make sure that port 44322 is open in your security groups if applicable. If it is not, metrics will still be available locally on the host for local analysis with the PCP command line tools, but not for any external tools to aggregate.

Once the setup is complete, you can log in via ssh to any of the gateway nodes and run the following command to check all the metrics that PCP is collecting.

Retrieving archived metrics

You can use the PCP CLI tools to retrieve metrics from an archive file. In traditional installation, the archives can be located at /var/log/pcp/pmlogger/

For example:

/var/log/pcp/pmlogger/controller.example.com/20241004.00.10

In containerised installation, the archives can be located at /home/ansible/aap/pcp_archives

For example:

/home/ansible/aap/pcp_archives/controller.example.com/20241004.00.10

Examples

  • To list all metrics that were enabled when the archive file was created, enter the following command:

    # pminfo --archive <ARCHIVE_FILE_LOCATION>
  • To view the host and time period covered by an archive file, enter the following command:

    # pmdumplog -l <ARCHIVE_FILE_LOCATION>
  • To list disk writes for each partition over the time period covered by the archive file:

    # pmval --archive <ARCHIVE_FILE_LOCATION> \
    -f 1 disk.partitions.write
  • To list disk write operations per partition, with a 2-second interval, over the time period between 14:00 and 14:15:

    # pmval --archive <ARCHIVE_FILE_LOCATION> \
    -d -t 2sec \
    -f 3 disk.partitions.write \
    -S @14:00 -T @14:15
  • To list average values of all performance metrics, including the time and value of the minimum/maximum, over the time period between 14:00 and 14:30, and format the values as a table:

    # pmlogsummary <ARCHIVE_FILE_LOCATION> \
    -HlfiImM \
    -S @14:00 \
    -T @14:30 \
    disk.partitions.write \
    mem.freemem
  • To display system metrics stored in an archive, starting from 14:00, in an interactive manner similar to the top tool:

    # pcp --archive <ARCHIVE_FILE_LOCATION> \
    -S @14:00 \
    atop

Takeaways

Monitoring Ansible Automation Platform is essential for the reliability, performance and security of the services it supports. It helps detect and address issues like slow response times, server errors and security vulnerabilities in real time, minimizing downtime and potential disruptions to users. By continuously tracking key metrics, such as traffic, usage and resource consumption, monitoring enables the platform to operate at optimal efficiency.

Where to go next

product trial

Red Hat Ansible Automation Platform | Product Trial

Download the no-cost, 60-day Red Hat Ansible Automation Platform trial, which includes access to our system management and predictive analytics software.

About the author

Nikhil Jain is a Principal Software Engineer with Red Hat’s Performance and Scale Engineering team who focuses on the testing, analysis and improvement of Red Hat Ansible Automation Platform products and services.
Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

Browse by channel

automation icon

Automation

The latest on IT automation for tech, teams, and environments

AI icon

Artificial intelligence

Updates on the platforms that free customers to run AI workloads anywhere

open hybrid cloud icon

Open hybrid cloud

Explore how we build a more flexible future with hybrid cloud

security icon

Security

The latest on how we reduce risks across environments and technologies

edge icon

Edge computing

Updates on the platforms that simplify operations at the edge

Infrastructure icon

Infrastructure

The latest on the world’s leading enterprise Linux platform

application development icon

Applications

Inside our solutions to the toughest application challenges

Original series icon

Original shows

Entertaining stories from the makers and leaders in enterprise tech