Skip to main content

Monitor your infrastructure with Streamlit

Learn how to use Streamlit for monitoring and compare it against Grafana and InfluxDB to observe your infrastructure.
Image
NMAP scanning magic

Photo by Mateusz Dach from Pexels

Streamlit is an open source data visualization framework written in Python. In Display your application data with Streamlit, I introduced the framework and provided some basic examples of how to use it.

This article expands on the previous one, covering the following topics:

  • Using Ansible to set up a Prometheus Node Exporter and a scraper to collect metrics about your system.
  • Connecting to the Prometheus scraper to get the metrics and display them in real-time using Streamlit.

You will need the following to complete the tutorial:

  • Elevated permissions to install Prometheus Node Exporter and the scraper.
  • Some experience with Python programming.
  • Curiosity!

The first step is to learn how to collect metrics over time from your machines using Prometheus.

Monitor memory utilization periodically from host machines

Prometheus is an open source framework created to collect metrics about your system. It also provides visualization, efficient storage, and many other cool features.

In a typical on-premises Prometheus setup, you have one or more Node Exporters collecting data and a main scraper aggregating the data from all those nodes.

You can visualize the collected data directly on your agent using the local URL http://prometheuscollector:9090/, or you can have another visualization tool, like Grafana, talk to the collector.

In this article, I'll develop a Streamlit application that talks to Prometheus to visualize this data. First, I'll look at some of the required components.

The Prometheus collector (scraper)

Say your homelab has two machines, named raspberrypi and dmaf5, both running Prometheus Node Exporter agents. You can set up your Prometheus scraper configuration prometheus.yaml to reflect:

---
global:
    scrape_interval: 30s
    evaluation_interval: 30s
    scrape_timeout: 10s
    external_labels:
        monitor: 'nunez-family-lab-monitor'

scrape_configs:
    - job_name: 'node-exporter'
      static_configs:
          - targets: ['raspberrypi.home:9100', 'dmaf5.home:9100']

How can you install all the data acquisition pieces? You can use an Ansible playbook for that.

[ Write your first Ansible playbook in this hands-on interactive lab. ]

Provision metrics collection (or how to automate the boring stuff with Ansible)

Put the following pieces together for the automation recipe to work:

  1. Inventory file: Tells Ansible where to install and deploy the Node Exporter agents
  2. Template to set up the Prometheus daemon that collects the metrics
  3. Systemd unit files for Prometheus and Node Exporter. The Node Exporter runs on each node, and Prometheus runs on the master node.
  4. This playbook that orchestrates all the actions, from copying configurations, to downloading software, and finally starting processes:
# Playbook to provision Prometheus scraper and node-exporter servers
# Download binaries from: https://prometheus.io/download/#prometheus
# Checked with: ansible-lint provision_prometheus.yaml
---
- name: Setup Prometheus on your home lab
  hosts: home_lab
  become_user: root
  become: true
  vars:
    prometheus_scraper_version: "2.42.0"
    prometheus_node_exporter_version: "1.5.0"
    prometheus_url: "https://github.com/prometheus/prometheus/releases/download"
    exporter_url: "https://github.com/prometheus/node_exporter/releases/download"
    prometheus_node_exporter_port: 9100
    prometheus_install_dir: "/opt/prometheus"
    prometheus_data_dir: "/data/prometheus"
    prometheus_conf_dir: "/etc/prometheus"
    prometheus_checksum:
      amd64: "422dab055ed9c7bcaff52b718705f9192c6fac0de6b7e78dd278e70ee2663dcc"
      arm64: "b947e4890d221f6b1d09c963fd0440720f2b5d08c29e190d6365f4b3d3e92a9d"
    prometheus_node_exporter_checksum:
      amd64: "af999fd31ab54ed3a34b9f0b10c28e9acee9ef5ac5a5d5edfdde85437db7acbb"
      arm64: "e031a539af9a619c06774788b54c23fccc2a852d41437315725a086ccdb0ed16"
  tasks:
    - name: Set architecture (home lab has amd64 and arm64)
      ansible.builtin.set_fact:
        prom_arch: "{{ 'amd64' if ansible_architecture == 'x86_64' else 'arm64' }}"
    - name: Prometheus user
      tags: user
      ansible.builtin.user:
        name: prometheus
        system: true
        create_home: false
        comment: "Prometheus service account"
        shell: /sbin/nologin
        state: present
    - name: Prometheus install directory
      ansible.builtin.file:
        state: directory
        owner: "prometheus"
        mode: 0755
        path: "{{ prometheus_install_dir }}"
    - name: Scraper management
      when: "'master_lab' in group_names"
      block:
        - name: Install Prometheus Scraper
          tags: scraper
          ansible.builtin.get_url:
            dest: "/tmp/prometheus-{{ prometheus_scraper_version }}.linux-{{ prom_arch }}.tar.gz"
            url: "{{ prometheus_url }}/v{{ prometheus_scraper_version }}/prometheus-{{ prometheus_scraper_version }}.linux-{{ prom_arch }}.tar.gz"
            checksum: "sha256:{{ prometheus_checksum[prom_arch | default('arm64')] }}"
            mode: "u=rw"
        - name: Unpack Prometheus Scraper
          tags: unpack_scraper
          ansible.builtin.unarchive:
            remote_src: true
            owner: "prometheus"
            group: "prometheus"
            src: "/tmp/prometheus-{{ prometheus_scraper_version }}.linux-{{ prom_arch }}.tar.gz"
            dest: "{{ prometheus_install_dir }}"
          notify:
            - Restart Prometheus
        - name: Prometheus config directory
          ansible.builtin.file:
            state: directory
            owner: "prometheus"
            group: "prometheus"
            mode: 0770
            path: "{{ item }}"
          loop:
            - "{{ prometheus_conf_dir }}"
            - "{{ prometheus_data_dir }}"
        - name: Install Prometheus configuration
          tags: config_prometheus_copy
          ansible.builtin.template:
            src: templates/prometheus.yaml.j2
            dest: /etc/prometheus/prometheus.yaml
            owner: prometheus
            group: prometheus
            mode: 0644
          notify:
            - Restart Prometheus
        - name: Install Prometheus systemd unit
          tags: systemd_prometheus_copy
          ansible.builtin.template:
            src: templates/prometheus.service.j2
            dest: /etc/systemd/system/prometheus.service
            owner: root
            group: root
            mode: 0644
          notify:
            - Restart Prometheus
        - name: Make sure Prometheus is running
          tags: systemd_prometheus_start
          ansible.builtin.systemd:
            state: started
            name: prometheus
            enabled: true
            daemon_reload: true
    - name: Install Prometheus Node exporter
      tags: exporter
      ansible.builtin.get_url:
        dest: "/tmp/node_exporter-{{ prometheus_node_exporter_version }}.linux-{{ prom_arch }}.tar.gz"
        url: "{{ exporter_url }}/v{{ prometheus_node_exporter_version }}/node_exporter-{{ prometheus_node_exporter_version }}.linux-{{ prom_arch }}.tar.gz"
        checksum: "sha256:{{ prometheus_node_exporter_checksum[prom_arch | default('arm64')] }}"
        mode: "u=rw"
    - name: Unpack Node exporter
      tags: unpack_exporter
      ansible.builtin.unarchive:
        remote_src: true
        owner: "prometheus"
        group: "prometheus"
        src: "/tmp/node_exporter-{{ prometheus_node_exporter_version }}.linux-{{ prom_arch }}.tar.gz"
        dest: "{{ prometheus_install_dir }}"
      notify:
        - Restart Node Exporter
    - name: Install Node Exporter systemd unit
      tags: systemd_node_exporter_copy
      ansible.builtin.template:
        src: templates/node_exporter.service.j2
        dest: /etc/systemd/system/node_exporter.service
        owner: root
        group: root
        mode: 0644
      notify:
        - Restart Node Exporter
    - name: Install Node Exporter environment overrides
      tags: systemd_node_exporter_env
      ansible.builtin.copy:
        src: files/prometheus-node-exporter
        dest: /etc/default/prometheus-node-exporter
        owner: root
        group: root
        mode: 0644
      notify:
        - Restart Node Exporter
    - name: Make sure Node Exporter is running
      tags: systemd_node_exporter_start
      ansible.builtin.systemd:
        state: started
        name: node_exporter
        enabled: true
        daemon_reload: true
  handlers:
    - name: Restart Node Exporter
      tags: restart_node_exporter
      ansible.builtin.service:
        name: node_exporter
        state: restarted
    - name: Restart Prometheus
      tags: restart_prometheus
      ansible.builtin.service:
        name: prometheus
        state: restarted

You can see it in action:

$ ansible-playbook --inventory inventory provision_prometheus.yaml

After running the playbook, you can check whether it collects data by connecting to the Prometheus host you set up in your inventory using the URL http://prometheuscollector:9090/. Then you can run queries using PromQL like this one:

rate(node_cpu_seconds_total{mode="system"}[1m])
Image
Prometheus web interface data query
(Jose Nunez, CC BY-SA 4.0)

Display Prometheus metrics with Streamlit

Prometheus supports queries using the PromQL language. It also offers a REST API to expose those metrics to clients.

To demonstrate how the monitoring works, use the following metric:

node_memory_MemFree_bytes

If you query the scraper node, it returns a JSON response like this:

{
 "status": "success",
 "data": {
  "resultType": "matrix",
  "result": [
   {
    "metric": {
     "__name__": "node_memory_MemFree_bytes",
     "instance": "dmaf5.home:9100",
     "job": "node-exporter"
    },
    "values": [
     [
      1678066203.119,
      "1086185472"
     ],
     [
      1678066233.119,
      "1086185472"
     ],
     [
      1678066263.119,
      "1086185472"
     ]
    ]
   }
  ]
 }
}

You can see the full example in my GitHub repo.

The next step is writing a simple Streamlit application to collect the metrics and display them using a line chart:

import json
import logging
from datetime import datetime, timedelta
from json.decoder import JSONDecodeError
import os
import textwrap
import altair
import requests
import streamlit as st
from pandas import DataFrame, Series, Timestamp
from numpy import float64
from requests import HTTPError, RequestException

MINUTES_BACK = 60
DEFAULT_TIME_BACK = timedelta(minutes=-MINUTES_BACK)
DEFAULT_QUERY = 'node_memory_MemFree_bytes'
STEP_DURATION = "30s"


@st.cache_data
def full_url(url: str, has_time_range: bool = True) -> str:
    if has_time_range:
        return f"{url}/api/v1/query_range"  # Range query
    return f"{url}/api/v1/query"  # Instant query


def get_metrics(
        the_payload: dict[str, any],
        url: str,
        start_range: datetime = None,
        end_range: datetime = None
) -> (dict[any, any], int):
    new_query = {}
    new_query.update(the_payload)
    if start_range and end_range:
        new_query['start'] = start_range.timestamp()
        new_query['end'] = end_range.timestamp()
        new_query['step'] = STEP_DURATION
    logging.info("url=%s, params=%s", url, new_query)
    response = requests.get(url=url, params=new_query)
    return response.json(), response.status_code


def transform(m_data: dict[any, any]) -> DataFrame:
    """
    Convert a Prometheus data structure into a Panda DataFrame
    :param m_data:
    :return: DataFrame
    """
    df = DataFrame({
        mtr['metric']['instance']: Series(
            data=[float64(vl[1]) for vl in mtr['values']],
            index=[Timestamp(vl[0], unit='s') for vl in mtr['values']],
            name="Free memory (bytes)"
        ) for mtr in m_data['data']['result']
    })
    logging.info(f"Columns: {df.columns}")
    logging.info(f"Index: {df.index}")
    logging.info(f"Index: {df}")
    return df


if __name__ == "__main__":

    st.title("Realtime Prometheus monitoring")
    data_load_state = st.text('No data loaded yet...')
    if 'PROMETHEUS_URL' not in os.environ:
        st.markdown("## Please define the following environment variable and restart this application (example below):")
        st.code(textwrap.dedent(f"""
        PROMETHEUS_URL="http://raspberrypi:9090/"
        export PROMETHEUS_URL
        streamlit run {__file__}
        """))
        st.markdown(
            "New to Prometheus?. Please check the [Official](https://prometheus.io/docs/prometheus/latest/querying/api/) documentation")
        data_load_state.error("No data was loaded.")
    else:
        code = 0
        metrics = {}
        try:
            PROM_URL = full_url(os.environ['PROMETHEUS_URL'], has_time_range=True)
            st.info(f"Using '{PROM_URL}'")
            query = DEFAULT_QUERY
            payload = {'query': query}
            # First query we boostrap with a reasonable time range
            END: datetime = datetime.now()
            START = END + DEFAULT_TIME_BACK
            if payload:
                (graph, raw) = st.tabs(["Time Series", "Debugging"])
                metrics, code = get_metrics(
                    url=PROM_URL,
                    the_payload=payload,
                    start_range=START,
                    end_range=END,
                )
                data: DataFrame = DataFrame()
                if code == 200:
                    now = datetime.now()
                    data_load_state.info(f"Metrics data refreshed ({now}).")
                    logging.info(f"Metrics data refreshed ({now}).")
                    try:
                        data = transform(m_data=metrics)
                        with graph:
                            st.title("Time series")
                            # See auto-refresh dilema: https://github.com/streamlit/streamlit/issues/168
                            if st.button('Click to refresh!'):
                                st.write("Refreshing")
                                st.experimental_rerun()
                            # st.line_chart(data=data)
                            # You can see all the possible type encodings here: 
                            # https://altair-viz.github.io/altair-tutorial/notebooks/02-Simple-Charts.html
                            chart = (
                                altair.Chart(data.stack().rename("value").reset_index()).mark_line().encode(
                                    x="level_0:T",
                                    y="value:Q",
                                    color="level_1"
                                )
                            )
                            st.altair_chart(chart, use_container_width=True)                            

                    except ValueError as val:
                        st.exception(val)
                    with raw:
                        if not data.empty:
                            st.title("DataFrame for Free memory (bytes)")
                            st.dataframe(data)
                        st.title("Query:")
                        st.markdown(f"```{query}, start={START}, end={END}```")
                        st.title("Prometheus data:")
                        st.json(metrics)
                else:
                    st.warning(f"Hmm, invalid query?: {query}")
                    st.warning(json.dumps(metrics, indent=True))
        except (HTTPError, JSONDecodeError, RequestException, KeyError) as exp:
            st.error(f"There was a problem while running the query (HTTP_CODE={code})...")
            if isinstance(exp, KeyError):
                st.code(f"Metrics={json.dumps(metrics, indent=True)}...")
            st.exception(exp)

A few things to note:

  • The data source is a Prometheus scraper that returns a JSON document. It's converted to a Pandas DataFrame, one of the most well-supported formats on Streamlit.
  • Next, it adds the graphical components; one tab to show the time-series data and the other to display the data and queries in tabular format for debugging purposes.
  • It also adds a button to refresh the plot data manually. There are ways to auto-refresh the contents.

Before running the application, reload the Python virtual environment you created for Streamlit. For additional information, revisit Display your application data with Streamlit.

What does it look like? See it in this screenshot of the Prometheus times-series line chart:

Image
Prometheus monitoring data displayed with Streamlit plot
(Jose Nunez, CC BY-SA 4.0)

Here is the debugging tab, where you can see the raw data, queries, and other useful stuff:

Image
Prometheus monitoring data Streamlit debug tab
(Jose Nunez, CC BY-SA 4.0)

Next, I'll compare Steamlit with other popular visualization tools to help you decide if it's a good fit for a DevOps or systems administrator.

Compare Streamlit with other visualization tools

A brief comparison of Grafana, InfluxDB, and Streamlit out-of-the-box capabilities can help you decide which one to use. As usual, no single tool can do everything, and you most likely will need a mix of them.

For this analysis, the focus is on observability in general. Some features may be more or less appealing to you.

All three open source solutions support tables and time series graphics. Specific features of each (and that may be particularly interesting if your focus is on data science) include:

  • Grafana 9.4:
    • Embedded alerting: Yes, multiple destinations supported
    • Multiple data sources: Yes
    • Supported languages: Golang, depends on the used data source (JSONPath, FluxQL)
    • Automatic data refresh: Yes
    • Easy to write a "live" paper? Supports Markdown but requires write permission on the dashboard
  • InfluxDB 2.6.1:
    • Embedded alerting: Yes
    • Multiple data sources: No
    • Supported languages: Flux, InFluxQL
    • Automatic data refresh: Yes
    • Easy to write a "live" paper? Supports Markdown but requires write permission on the dashboard
  • Streamlit 1.20.0:
    • Embedded alerting: No, requires external libraries and logic
    • Multiple data sources: Yes
    • Supported languages: Python
    • Automatic data refresh: No, DIY
    • Easy to write a "live" paper? Full freedom with Markdown, videos, intermixed

These three tools overlap in many areas, but there are key differences:

  1. They can all run in containers, which may simplify their deployment.
  2. Grafana is the most feature-rich and needs to interface with an external data source like Prometheus, InfluxDB, or a JSON REST service to visualize data. It is not a database and has powerful alerting mechanisms (email, chat applications, and more).
  3. InfluxDB is a time-series database that also has dashboards and alerting capabilities, though more limited than Grafana.
  4. Streamlit is a framework written in Python for data science applications. It can interface with infinite data sources. Among these three, it's the easiest to set up, but it is not a database and has no alerting capabilities.

So which one should you use? It depends. If you need quick prototyping and no alerting, Streamlit is a good fit; if you only need a database with simple visualization and alerting, then InfluxDB is for you; and for a more complete solution, although it's more complex to set up, you can't go wrong with Grafana.

Not everything is dashboards: A TCP port tester written with a Streamlit GUI

As a bonus, I wrote a simple TCP port scanner with Streamlit. You just need to define a YAML configuration file that tells the app which machines and ports to scan:

# Description of host and TCP ports to check
---
hosts:
  - name: raspberrypi.home
    ports:
      - 3000
      - 22
      - 9090
  - name: dmaf5.home
    ports:
      - 22
  - name: www.redhat.com
    ports:
      - 80
      - 443
  - name: github.com
    ports:
      - 80
      - 443

[ Get the YAML cheat sheet ]

The script goes through every port and checks if it's open or not. Note that this app is not meant to be stealth, and it's not fast, as it performs a sequential port scan:

#!/usr/bin/env python3
import logging
import textwrap
import socket
from typing import Any
from yaml import load
import streamlit as st

try:
    from yaml import CLoader as Loader
except ImportError:
    from yaml import Loader


PROGRESS_TEXT = "Scanning hosts. Please wait"


def check_tcp_port_xmas(dst_ip: str, dst_port: int) -> str:
    try:
        with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock:
            result = sock.connect_ex((dst_ip, dst_port))
            if result == 0:
                return "Open"
            else:
                return "Closed"
    except (TypeError, PermissionError) as perm_err:
        logging.exception(perm_err)
        return "Error"


def load_config(stream) -> Any:
    return load(stream, Loader=Loader)


if __name__ == "__main__":
    st.title("TCP Port scanner")
    st.markdown(textwrap.dedent("""
    Simple TCP/ IP port scanner.
"""))

    st.file_uploader(
        "Please provide the configuration file to load",
        accept_multiple_files=False,
        key="portscan_config"
    )
    if st.session_state['portscan_config']:
        yaml = load_config(st.session_state['portscan_config'])
        hosts_details = yaml['hosts']
        chunks = len(hosts_details)
        try:
            data_load_state = st.text('Preparing to scan...')
            with st.spinner(f"Total hosts to scan: {chunks}"):
                ip = None
                for host in hosts_details:
                    host_name = host['name'].strip()
                    ports = host['ports']
                    try:
                        ip = socket.gethostbyname(host_name)
                        for port in ports:
                            status = check_tcp_port_xmas(dst_ip=ip, dst_port=port)
                            data_load_state.text(f"Processing: {host_name}({ip}):{port}, status={status}")
                            if status == "Open":
                                st.info(f"{host_name}:{port}, status={status}")
                            elif status == "Closed":
                                st.warning(f"{host_name}:{port}, status={status}")
                            else:
                                st.error(f"{host_name}:{port}, status={status}")
                    except TypeError as os_err:
                        raise
            data_load_state.success(f"Finished scanning {chunks} hosts")
        except (KeyError, ValueError, OSError, TypeError) as err:
            st.error(hosts_details)
            st.exception(err)
    else:
        st.warning("Please load a PortTester configuration file to proceed")

This Python script requires some additional dependencies as listed in requirements.txt:

streamlit==1.20.0
pyyaml==6.0
pandas

Install the dependencies using pip:

$ pip3 install -r requirements.txt

Now run the script:

$ streamlit run porttester/port_tester.py

You will see something similar to this:

Image
TCP port scanner results displayed with Streamlit
(Jose Nunez, CC BY-SA 4.0)

Wrap up

Streamlit is well tuned for data-science applications, but you can also use it for infrastructure monitoring. You can also create live reports to quickly check the health of your sites, with very little coding. Here's a summary:

[ Want to test your sysadmin skills? Take a skills assessment today. ]

Topics:   Data visualization   Monitoring  
Author’s photo

Jose Vicente Nunez

Proud dad and husband, software developer and sysadmin. Recreational runner and geek. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.