Skip to main content

Build a Grafana dashboard to visualize data using Ansible and Podman

Examine your data in a user-friendly dashboard that shows multiple views of the same data.
Image
Photo of a laptop computer showing data visualization graphs

Photo by Carlos Muza on Unsplash

If you live or work in New York and running is your hobby, you may have run one of the New York Road Running (NYRR) races. Even if you're not a runner, you might have heard about its most famous race, the New York City Marathon. The NYRR website says:

Our history is rich because we run with passion. It all started with a local running club of just 40 people, but over the past 60+ years, committed runners have joined us in droves. Time, commitment, and our never-ending mission to help and inspire people through running have made us the world's premier community running organization.

NYRR has one of the largest databases of races for many distances, from 5Ks to marathons. Over the years, I've run many NYRR races, and I was curious about my running history. I downloaded a complete list of my race results into Excel format, massaged it a bit, and then asked more questions. I'll use my race results as the dataset for this tutorial, but you can use these instructions to create a Grafana dashboard to visualize another dataset of your choice.

[ Get the guide to installing applications on Linux. ]

This tutorial shows how to:

  • Set up a Podman container that provides access to your data in CSV format.
  • Set up a CSV data source in Grafana.
  • Create a dashboard that shows multiple views of the same data, including annotations.

Requirements

Before you can start visualizing your data, you need:

  • Permission to install software as root on a Linux distribution (I use Fedora; feel free to run any Linux distro that you like)
  • Python 3, which is installed with most Linux distributions
  • Ansible, which I'll show you how to install
  • Podman, and I will show you how to install it
  • Grafana, preferably running in a container (but a bare-metal installation also works), with the ability to install plugins such as the CSV plugin
  • Curiosity, the most crucial requirement

First, install Ansible (if it is not there) to get started. In Fedora, you can install Ansible using DNF:

$ sudo dnf -y install ansible

Consult the Ansible installation instructions for other distributions.

Add the extra tools for Ansible, including Podman support:

$ ansible-galaxy collection install containers.podman

[ Want to test your sysadmin skills? Take a skills assessment today. ]

Prepare the data file

After I downloaded my race data in Excel format, I had to edit the race-date format before exporting the resulting file to CSV. Here's my running results page:

Image
Export race results
(Jose Vincente Nunez, CC BY-SA 4.0)

The resulting CSV looks like this:

Event Name,Event Date,Distance,Finish Time,Pace,Gun Time,Overall Place,Gender Place,Age-Group Place,Age-Graded Time,Age-Graded Place,Age-Graded Percent
2018 NYRR Virtual GOOAAALLL 5K,07/15/2018,5 km,0:24:06,07:46,0:24:06,516,469,59,0:22:02,375,58.97
2018 NYRR Virtual Global Running Day 1M,06/10/2018,1 mile,0:06:31,06:31,0:06:31,203,182,23,0:05:56,138,62.57
2016 Abbott Dash to the Finish Line 5K,11/05/2016,5 km,0:23:27,07:33,0:29:40,1055,819,94,0:21:44,723,59.76
ING New York City Marathon 2013,11/03/2013,Marathon,4:27:18,10:12,4:29:56,27160,19238,4046,4:19:57,21389,47.30

This CSV is straightforward, with no need for further transformations. 

The next step is to make this CSV file available over HTTP, so Grafana can use it. You will use a Podman container for that.

[ Download now: Podman basics cheat sheet ]

Install the proper tools

Next, write an Ansible playbook with the suggested structure for putting files, templates, and instructions to get everything installed and started in the proper order.

The resulting Ansible playbook fragment (of nyrr_podman_provisioning.yaml) shows how to ensure Podman is present before all the other tasks are run.

- name: Tasks to get a NYRR race dashboards up and running
  hosts: localhost
  tasks:
  - name: Toolchain preparation
    block:
      - name: Install Podman
        ansible.builtin.dnf:
          name: podman
          state: installed
        become: true
    tags: toolchain

[ Get the YAML cheat sheet ]

Run the playbook to install Podman:

ansible-playbook --tags toolchain roles/races/nyrr_podman_provisioning.yaml

Create the web application

You need a web app to display the CSV results over HTTP. The application is relatively simple:

  • Serve a static file over HTTP; the Python 3.11 slim image will work well.
  • Drop the race results CSV file into a directory on your machine (my directory is ~/fitness), and the web server will pick it up. You can use my nyrr_org_results.csv sample file to get started.
  • To visualize the results, you will use containerized Grafana with a few preinstalled plugins.

Chicken-and-egg problem with the Grafana container

There is a classic "which came first, the chicken or the egg?" problem here because:

  1. You need a Grafana instance running to set up a data source and implement the dashboard, but you want to be able to restore your Grafana instance quickly, most likely on a more powerful server.
  2. You could create a data source first, then spin up a Grafana container and implement the dashboard. But that is not how things are done in reality.

You can see that Ansible is not necessary for the test Grafana instance, but some automation is desired (for things like installing custom plugins).

[ Related reading: How to customize Grafana dashboards using Ansible ]

I run my home_fitness web server and the Grafana containers from the same location. I mount the CSV from the home directory, so I need to pass a special SELinux flag so that the container can use the mounted file.

$ podman run --name nyrr_server --publish 8080:8080 --security-opt label=disable --volume /home/josevnz/fitness/nyrr_org_results.csv:/mnt/nyrr_org_results.csv:ro --restart=always --detach --workdir /mnt python:3.11-slim python3 -m http.server 8080
1c0c23ad9240c8b606bb1c9e8d585b89e1b7718d15c81088f25533f34c3b03b6

Then using a tool like curl, you can try downloading the processed copy of your data. In my example, it is the NYRR race results:

$ curl --fail --verbose --silent --url http://localhost:8080/nyrr_org_results.csv
*   Trying 127.0.0.1:8080...
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /nyrr_org_results.csv HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.85.0
> Accept: */*
> 
* Mark bundle as not supporting multiuse
* HTTP 1.0, assume close after body
< HTTP/1.0 200 OK
< Server: SimpleHTTP/0.6 Python/3.11.1
< Date: Thu, 02 Feb 2023 02:30:45 GMT
< Content-type: text/csv
< Content-Length: 1775
< Last-Modified: Thu, 02 Feb 2023 01:16:26 GMT
< 
Event Name,Event Date,Distance,Finish Time,Pace,Gun Time,Overall Place,Gender Place,Age-Group Place,Age-Graded Time,Age-Graded Place,Age-Graded Percent
2018 NYRR Virtual GOOAAALLL 5K,07/15/2018,5 km,0:24:06,07:46,0:24:06,516,469,59,0:22:02,375,58.97
2018 NYRR Virtual Global Running Day 1M,06/10/2018,1 mile,0:06:31,06:31,0:06:31,203,182,23,0:05:56,138,62.57
2016 Abbott Dash to the Finish Line 5K,11/05/2016,5 km,0:23:27,07:33,0:29:40,1055,819,94,0:21:44,723,59.76
ING New York City Marathon 2013,11/03/2013,Marathon,4:27:18,10:12,4:29:56,27160,19238,4046,4:19:57,21389,47.30
NYC Half 2013,03/17/2013,Half-Marathon,1:42:31,07:50,1:50:39,2322,1760,277,1:39:42,1989,58.56
Grete's Great Gallop in Support of AKTIV Foundation,10/14/2012,Half-Marathon,1:48:34,08:17,1:51:27,1508,1154,237,1:45:02,1267,56.00
NYRR Fifth Avenue Mile Presented by Nissan,09/22/2012,1 mile,0:05:59,05:59,9:59:48,1241,1116,204,0:05:42,1116,65.19
Fitness Games Men,09/15/2012,4 miles,0:29:07,07:17,0:29:55,480,480,97,0:27:43,535,61.00
Percy Sutton Harlem 5K Run,08/25/2012,5 km,0:23:50,07:41,0:25:10,975,771,130,0:22:42,803,57.00
Achilles Hope & Possibility,06/24/2012,5 miles,0:38:38,07:44,0:39:05,386,326,75,0:37:18,409,57.17
Celebrate Israel,06/03/2012,4 miles,0:30:04,07:31,0:31:52,856,748,143,0:28:38,829,59.00
UAE Healthy Kidney 10K,05/12/2012,10 km,0:49:15,07:56,0:51:29,1886,1567,308,0:47:49,1728,55.89
New York Colon Cancer Challenge 15K,04/01/2012,15 km,1:12:47,07:49,1:13:13,651,544,118,1:09:53,594,59.00
NYRR Gridiron Classic,02/05/2012,4 miles,0:30:28,07:37,0:33:45,1173,960,174,0:29:15,1099,57.80
Joe Kleinerman 10K,01/07/2012,10 km,0:55:05,08:52,0:59:35,2423,1708,316,0:52:51,1850,51.00
NYRR Dash to the Finish Line (5K),11/05/2011,5 km,0:23:36,07:36,0:25:32,593,471,90,0:22:40,562,57.28
* Closing connection 0

Here is a simpler way, which you will use for the container self health check:

$ python3 -c 'import urllib.request; import sys; urllib.request.urlopen("http://localhost:8080/nyrr_org_results.csv").readlines(); sys.exit(0)' && echo "SUCCESS"|| echo "FAILURE"

You will do something similar to spin up a throwaway Grafana instance for development.

Create a throwaway Grafana instance

Create another container, this time for Grafana:

$ podman run --rm --detach --publish 3000:3000 --name=grafana_test --env "GF_INSTALL_PLUGINS=marcusolsson-csv-datasource,marcusolsson-json-datasource" grafana/grafana-oss:latest
porman logs --follow grafana_test

The container can be customized further to include required plugins and persistent volumes. The installation data will be preserved when you restart it.

It's time to visualize the race results.

[ Get hands-on with Podman in this tutorial scenario. ]

Create and provision the data source and dashboard

Once the instance is up, you can log in the first time with the user admin and password admin. Then, add a CSV data source (example shown below):

Image
Add a CSV datasource
(Jose Vincente Nunez, CC BY-SA 4.0)

You design the dashboard interactively. Keep designing until you end up with something you like.

Here are the steps for my race results, but your example will be specific to your dataset:

  1. Show a time series with the age-graded percentile (best, worst).
  2. Display a table showing overall race results with the ability to filter them.

A detailed tutorial on how to create and best practices for dashboards is beyond the scope of this tutorial, but this image displays the results:

Image
Race results dashboard
(Jose Vincente Nunez, CC BY-SA 4.0)

To treat this as code, export the dashboard definition as JSON and save it to a file:

Image
Export the race results dashboard
(Jose Vincente Nunez, CC BY-SA 4.0)

To provision it as code, tell Grafana where it can get the dashboards (default.yaml):

apiVersion: 1
providers:
  - name: Jose dashboards providers
    type: file
    updateIntervalSeconds: 10
    options:
      path:
        /etc/grafana/provisioning/dashboards
      foldersFromFilesStructure: true

The dashboard you exported earlier (for example, NYRR-1675298041762.json) will also be copied to the correct directory.

Next, take care of the data source. You can define the data source as a YAML file on a particular directory (nyrr_race_results_datasouce):

apiVersion: 1
datasources:
  - name: DS_NYRR.ORG_RESULTS
    type: marcusolsson-csv-datasource
    uid: "{{ datasource_id }}"
    url: "http://{{ nyrr_service_host }}:8080/{{ nyrr_race_results_file }}"
    editable: true
    jsonData:
      storage: http

You can use Jinja expressions to refine the data source's behavior, like passing overrides to the plugin. Do it with a special csv.yaml plugin file:

---
apiVersion: 1
apps:
  - type: marcusolsson-csv-datasource
    org_id: 1
    disabled: false
    jsonData:
      allow_local_mode: true

All these artifacts are deployed on an external volume mounted by the Grafana container.

To recap, my example now has:

  1. A web server to provide the race results over HTTP.
  2. A CSV data source that can ingest the CSV file.
  3. A dashboard to display the results.

Your content should consist of similar resources specific to your data.

It's time to provision all the pieces using Ansible.

[ Download now: A system administrator's guide to IT automation. ]

Launch the containers from Ansible

Ansible offers support for Podman, so you can augment the previous playbook with extra instructions.

In the end, my nyrr_podman_provisioning.yaml looks like this:

---
- name: Tasks to get a NYRR race dashboards up and running
  hosts: localhost
  vars:
    race_results_dir: /home/josevnz/raceresults # Update this path
    grafana_data_dir: /home/josevnz/grafana_raceresults_data # Update this path
    nyrr_race_results_file: nyrr_org_results.csv
    datasource_id: 948e72a8-a6cc-11ed-a1ef-1c8341284421 # uuidgen --time
    nyrr_service_host: dmaf5.home # Replace with your host name
    nyrr_service_port: 8080
    nyrr_service_tag: 3.11-slim
    grafana_plugins: marcusolsson-csv-datasource,marcusolsson-json-datasource
    grafana_service_port: 3000
  tasks:
    - name: Toolchain preparation
      tags: toolchain
      ansible.builtin.dnf:
        name: podman
        state: installed
      become: true
    - name: Provision data services
      tags: data_services
      block:
        - name: Copy NYRR race results to {{ race_results_dir }}
          tags: copy_service_files
          ansible.builtin.copy:
            dest: "{{ race_results_dir }}/"
            src: "{{ nyrr_race_results_file }}"
            directory_mode: a+xr
            mode: a+r,u+w
        - name: Python3 image
          tags: pull_service_image
          containers.podman.podman_image:
            name: python
            tag: "{{ nyrr_service_tag }}"
        - name: Launch containers required to see the race results
          tags: launch_nyrr_data
          containers.podman.podman_container:
            init: true
            name: "nyrr_server"
            image: "python:{{ nyrr_service_tag }}"
            command: "python3 -m http.server {{ nyrr_service_port }}"
            state: started
            security_opt:
              label=disable
            restart_policy: "always"
            detach: true
            workdir: /mnt
            rm: false
            ports:
              - "{{ nyrr_service_port }}:{{ nyrr_service_port }}"
            expose:
              - "{{ nyrr_service_port }}"
            volumes:
              - "{{ race_results_dir }}/{{ nyrr_race_results_file }}:/mnt/{{ nyrr_race_results_file }}:ro"
            healthcheck: "python3 -c 'import urllib.request; import sys; urllib.request.urlopen(\"http://localhost:{{ nyrr_service_port }}/{{ nyrr_race_results_file }}\").readlines(); sys.exit(0)'"
            healthcheck_retries: 3
            healthcheck_start_period: "10s"
    - name: Provision visualization services
      tags: visualization_services
      block:
        - name: Copy Grafana artifacts to final location
          tags: copy_visualization_files
          ansible.builtin.copy:
            dest: "{{ race_results_dir }}/"
            src: "{{ nyrr_race_results_file }}"
            directory_mode: a+xr
            mode: a+r,u+w
        - name: Grafana image
          tags: pull_service_image
          containers.podman.podman_image:
            name: "grafana/grafana-oss"
            tag: "latest"
        - name: Provisioning directories {{ grafana_data_dir }}
          tags: grafana_provision_dirs
          ansible.builtin.file:
            path: "{{ grafana_data_dir }}/provisioning/{{ item }}"
            mode: "ugo+xr,u+w"
            state: "directory"
            recurse: true
          loop:
            - access-control
            - alerting
            - dashboards/racing
            - datasources
            - notifiers
            - plugins
        - name: Launch Grafana container
          tags: launch_grafana
          containers.podman.podman_container:
            init: true
            name: "grafana_races"
            image: "grafana/grafana-oss:latest"
            state: started
            security_opt:
              label=disable
            restart_policy: "always"
            detach: true
            rm: false
            env:
              GF_INSTALL_PLUGINS: "{{ grafana_plugins }}"
            ports:
              - "{{ grafana_service_port }}:{{ grafana_service_port }}"
            expose:
              - "{{ grafana_service_port }}"
            volumes:
              - "{{ grafana_data_dir }}/provisioning:/etc/grafana/provisioning:rw"
        - name: Deploy files to provision directories
          tags: files_grafana
          ansible.builtin.copy:
            dest: "{{ grafana_data_dir }}/{{ item | replace('files/grafana/', '') }}"
            src: "{{ item }}"
            mode: a+r,u+w
          loop:
            - files/grafana/provisioning/dashboards/racing/NYRR-1675298041762.json
        - name: Deploy templates to provision directories
          tags: templates_grafana
          ansible.builtin.template:
            dest: "{{ grafana_data_dir }}/{{ item | replace('templates/grafana/', '') | replace('.j2', '') }}"
            src: "{{ item }}"
            mode: a+r,u+w
          loop:
            - templates/grafana/provisioning/dashboards/default.yaml.j2
            - templates/grafana/provisioning/datasources/nyrr_race_results_datasource.yaml.j2
            - templates/grafana/provisioning/plugins/csv.yaml.j2

You'll need to change some paths for this to work in your environment.

Run the Ansible playbook

Below is a capture of how the entire provisioning process looks on my computer, plus a quick inspection of the two running containers:

Race results dashboard in action

View the video of my freshly created Grafana instance.

Wrapping up

There are other visualization tools out there. I like Grafana because it is open source, the workflow is easy to understand, and the results are clear. Feel free to try anything else with your data.

You don't need a database to manage your data. You can use a simple CSV or JSON file, but you can always make your data store more sophisticated.

The Grafana ecosystem has other tools to help you manage your data sources as code, allowing you to automate even more of your process. In particular, I like grafyaml as it is well-documented and mature.

If you need to perform more complex tasks while provisioning your Grafana instances with Ansible, explore the Ansible Grafana project. It uses the Grafana REST API to perform complex provisioning tasks more easily (keep in mind the project seems stale).

Finally, support your local non-profit racing club. For example, NYRR promotes youth running and has helped to build community relationships with people all over the world. If you like running, please consider participating in its organized races, donating, or volunteering.

Topics:   Ansible   Podman   Monitoring   Data visualization  
Author’s photo

Jose Vicente Nunez

Proud dad and husband, software developer and sysadmin. Recreational runner and geek. More about me

Try Red Hat Enterprise Linux

Download it at no charge from the Red Hat Developer program.