Today we're proud to announce the open sourcing of our montioring scripts. The OpenShift Online Operations Team has published the OpenShift-Zabbix repository containing the monitoring scripts to monitor an OpenShift installation.
We use these scripts to monitor OpenShift Online environment using Zabbix. They are aimed at giving OpenShift Enterprise and OpenShift Origin users a good starting point for monitoring their OpenShift deployments as well.
Repository Structure
The OpenShift-Zabbix repository is structured in the standard Puppet module format. We don't expect every consumer to also use Puppet in their infrastructure. Puppet is the tool used in the OpenShift Online environment and also provides documentation on how the scripts are intended to be deployed. Users can consume the Puppet manifests as-is, or use them as a guide to integrate the scripts into their own configuration management infrastructure.
The scripts themselves reside in the files/checks/ directory. There are library files in the files/lib/ directory. These files are expected to be deployed to a bin/ and lib/ directory. (e.g. /usr/local/bin and /usr/local/lib)
Also included in the repository is the files/xml/ directory. This directory contains XML-based template files used by Zabbix to create items, triggers, and graphs. These files will make it easy to quickly configure Zabbix for monitoring the supplied data points.
Finally, the manifests/ directory contains the configuration documentation in the form of Puppet code. The primary details that can be found here are package requirements the scripts will need and the cron jobs that ultimately execute the check scripts to push data into the Zabbix server.
Design Decisions
The decision to use cron to execute the monitoring checks is a result of operational experiences gained by the OpenShift Online Operations Team. We found that Zabbix agent checks tend to have difficulty running at scale. As the number of items grows, the zabbix-agent and the zabbix-server's poller processes can struggle to collect data in a timely manner.
Our solution to this is to run all of our checks using the zabbix_sender command, which reads a file containing item data, and will push that data up to the Zabbix server. Some of our checks involve checking things with inherent and unpredictable latency, such as a ping through MCollective from the Broker to the Nodes. This made cron a reasonable choice, given the tradeoffs we needed to make between having fast checks and monitoring inherently variable components.
Current Checks
With the initial release, there are five check scripts that are included in the repository. Here is a brief description of what each one does.
- check-accept-node
- Runs the 'oo-accept-node' command, attempts some automated fixes to known bugs/problems, and then returns the command status.
- check-activemq-stats
- Collects statistics within the JVM running ActiveMQ for tracking common performance data.
- check-district-capacity
- Runs the 'oo-stats' command to collect data about capacity utilization of an OpenShift district. Reports available uids and gears within a district.
- check-mc-ping
- Runs 'mco ping', collates the results, and identifies nodes that are not responding in a timely fashion.
- check-user-action-log
- Scans
/var/log/openshift/user_action.log, parsing event data from the log to provide insight into the health of the broker and general user experience interacting with the service.
- Scans
Get started!
The OpenShift-Zabbix repository should contain everything you need to get started monitoring your OpenShift deployment. The README.md documents, everything you'll need to get started. If you're interested in contributing and collaborating with us, fork the code and send us a pull request; more information is in the COLLABORATING.md file.
For us, this is a starting point for open sourcing more of our monitoring work over time. We are interested in providing examples and ideas about how to make the most of your OpenShift installation. In addition, we would like to encourage all operations teams running OpenShift to engage in a conversation about how we keep our infrastructure running. We are excited to help make running OpenShift in a production environment an even easier and better experience.
Next Steps
- Try these scripts out in your environment and let us know what you think in the comments
- Don't have a running OpenShift environment to try this out? Install OpenShift Origin with one shell command and you'll be able to do it in no time.
- Watch the video above to learn monitoring best practices for OpenShift
Sobre el autor
Más como éste
Solving the scaling challenge: 3 proven strategies for your AI infrastructure
From incident responder to security steward: My journey to understanding Red Hat's open approach to vulnerability management
Technically Speaking | Platform engineering for AI agents
Technically Speaking | Driving healthcare discoveries with AI
Navegar por canal
Automatización
Las últimas novedades en la automatización de la TI para los equipos, la tecnología y los entornos
Inteligencia artificial
Descubra las actualizaciones en las plataformas que permiten a los clientes ejecutar cargas de trabajo de inteligecia artificial en cualquier lugar
Nube híbrida abierta
Vea como construimos un futuro flexible con la nube híbrida
Seguridad
Vea las últimas novedades sobre cómo reducimos los riesgos en entornos y tecnologías
Edge computing
Conozca las actualizaciones en las plataformas que simplifican las operaciones en el edge
Infraestructura
Vea las últimas novedades sobre la plataforma Linux empresarial líder en el mundo
Aplicaciones
Conozca nuestras soluciones para abordar los desafíos más complejos de las aplicaciones
Virtualización
El futuro de la virtualización empresarial para tus cargas de trabajo locales o en la nube