It all started when our quality engineering (QE) team realized that we were deploying our machines way too often. We couldn’t keep up with new build testing for our project.
To make things easier, we decided to collaborate with our virtualization QE team to help us understand how they automate their deployments. It turns out that they use Ansible roles written to perform specific tasks. After talking to them, I spent a good amount of time understanding and automating Red Hat Virtualization for our team.
This article outlines how it went. You should know that I had not worked with Jenkins Pipeline before, so my work may not reflect best practices. I am still learning.
Setting up the files
In addition to my internal files, I turned to GitHub’s oVirt Ansible section. The particularly useful Ansible roles here were:
jenkins_files, I started by checking out the Git repo. Then, I installed the packages required on the Jenkins node. Note that some DNF and PIP packages are required.
Ultimately, it took a lot of trial and error to figure out the correct packages and versions to use. Once everything worked in my local environment (on my computer), I started translating it all to Jenkins Pipeline.
Debugging our configuration playbook
Once that task was done, we Beaker-reprovisioned the hosts with a fresh install of our distribution, configured it to export NFS storage, installed the required repos, and then finally ran our deployment playbook. Once the oVirt-hosted engine was deployed, we ran our configuration playbook to set up our Red Hat Virtualization and machines with networks, storage, hosts, and ISO files as needed.
An issue I ran into during this phase is that there was a task in one of the playbooks that used
whoami to find out what user it should use to
ssh from the hypervisor to the hosted engine, and that task chose my personal user login incorrectly instead of
root. I raised that issue with the
ovirt-ansible-hosted-engine contributors and they fixed it.
Once that problem was solved, I encountered another. It was getting past the previous failure step and deploying the hosted engine correctly, but when running from Jenkins it kept failing with a cryptic error message that was hardly telling me what was wrong. With some digging, I found out that the authentication between the hypervisor host and hosted engine was broken. I then added
he_root_ssh_pubkey attribute to my deployment and made sure the hypervisor had the correct private key deployed through Beaker. Doing this fixed another issue, and my playbook finally finished running
engine-setup and finished the deployment.
Adding further enhancements
We also wanted to set predictable passwords in our virtualization tools’ database
ovirt_engine and data warehouse
ovirt_engine_history database, but the
ovirt-hosted-engine-setup role did not allow that. I raised a pull request to fix that, and am hoping to get it merged soon.
Two more things that I added are the ability to copy custom SSH keys to
known_hosts on the hypervisors, and skipping deployment if we are already on the correct build. For the SSH part, I used the
known_hosts module to achieve that task. This community module was slightly confusing, so I wrote my first pull request against the Ansible Repo, extending the docs and examples for
known_hosts module. I also hope that this addition gets merged soon.
To restrict the re-deployments in case we were already on the same build, I decided to use the
repodiff command to pass the current build URL (fetched from the installed packages/repos) and the new build URL that was being requested by the pipeline’s user. The
repodiff command could look at both repos and compare the packages, and if there were no packages Added/Removed/Modified, the installed build and the build we were trying to install would be the same.
If this was the case, we used our bash script and playbooks to skip the installation and jump right to the last phase of the pipeline: Running smoke tests for our products.
In the end, this process was painful and challenging, but also a rewarding experience to see it all come through. Take a look at one such pipeline run. Stage 3 failure is expected in some situations and is ignored.
Thanks for reading, if you are interested in learning more details on how I achieved this and how we are planning to use it, or details about any of the code, please feel free to contact me, as this article does not cover all the details, but is just an overview.