订阅内容

As Red Hat engineers, we are always looking to incorporate features that empower administrators and decision makers. Our goal is to enable them to be proactive, efficient, and to help them maximize value from their infrastructure.

To this end, we are currently working on how to significantly improve the reporting and metrics API in Red Hat Virtualization Manager, our management platform for virtualized resources. Until recently, Red Hat Virtualization relied on native reports and data warehouse engines to provide

access, visualization, and insights into the virtualized and physical infrastructure.

When working on improving user experience, we found that there are some areas that needed improvement. For example, users often requested real time monitoring information, but the reports only showed aggregated data in hourly and daily increments. Although having the historical data has its advantages in capacity planning and performance analysis over time, having real time data means being able to address issues faster, minimizing downtime and improving overall performance.

We also received requests to add monitoring for the engine machine itself and database performance. This is important of course to manage its resources and address db issues for example.

Currently, as seen in the data flow image below, the engine is responsible for data collection from the hosts. This adds load on the engine database, consumes resources and affects the overall engine performance. image01_metrics.png

Improved Metrics Visualization Benefits

We found that there are now better, scaleable, faster and distributed solutions for metrics collection and processing. One of the major improvements we decided on is to collect the data directly from the hosts. This means lowering the load on the engine database and machine, and improving the engine performance.

From each host and engine, data will be collected by Collectd, a simple and powerful daemon that gathers metrics from various sources, e.g. the operating system, applications, log files and external devices, and makes it available over the network. The data will be processed by Fluentd, a data collector that unifies the data, and then send it to a central metrics store.

The users will be able to view and analyze the metrics in real-time, by using the visualization tool. image02_metrics.png

We are very excited about the value this project will provide our users. In the next versions we plan to further integrate it with an easy to install metrics store solution, add built-in dashboards and alerting.

When choosing a metrics store solution we focused on having the following features: metadata handling, high availability, scalability, federation, JDBC/ODBC support, down-sampling option, supported alerting and notification tool and supported visualization tools that provides self service, widgets diversity and interactivity.

We will continue working on adding additional Collectd plugins and process additional logs.

In addition, we plan on adding smart management, based on the metrics store. That is, to trigger engine events according to specific criteria. For example: If we see that there are many if_errors on a network interface for a host we can trigger moving it to non-operational to prevent workload downtime due to network issues.

Questions, comments, or feedback on metrics and reporting?  Reach out using the comments section (below).

Yaniv & Shirly


关于作者

UI_Icon-Red_Hat-Close-A-Black-RGB

按频道浏览

automation icon

自动化

有关技术、团队和环境 IT 自动化的最新信息

AI icon

人工智能

平台更新使客户可以在任何地方运行人工智能工作负载

open hybrid cloud icon

开放混合云

了解我们如何利用混合云构建更灵活的未来

security icon

安全防护

有关我们如何跨环境和技术减少风险的最新信息

edge icon

边缘计算

简化边缘运维的平台更新

Infrastructure icon

基础架构

全球领先企业 Linux 平台的最新动态

application development icon

应用领域

我们针对最严峻的应用挑战的解决方案

Original series icon

原创节目

关于企业技术领域的创客和领导者们有趣的故事