Package software and data with self-compressed scripts
Sometimes, you need a quick and reliable way to distribute data or software to users without using a package manager (for example, the end user may not have root access to install an application).
This issue could be tackled by using containers and Podman or Docker, but what if they're not available on the destination system? What if one of the requirements is for the application to work on a bare-metal environment?
You could use Python with pip
(and you probably know that you can package non-Python artifacts, too), but then you may be faced with some installation limitations (a virtual environment, or the --user
option), not to mention that you need boilerplate code to package your Python code.
So is everything lost? Fear not! In this article, I demonstrate a very small but effective technique to write a self-extracting script that doesn't require elevated privileges.
Set up your data
Damiaan Zwietering has a cool Git repository about coronavirus with data and visualizations (Jupyter books and Excel spreadsheets), but no installer. Suppose you want to give this to your users, but they don't have access to Git. You can create a self-extracting installer for your users.
In real life, you'd already have data that you want to distribute. But so you have some sample data to work with, first clone this repository to your home directory:
$ git clone https://gitlab.com/dzwietering/corona.git
There's now a lot of data and a not-so-shallow directory structure, but you can create a .tar file using the git archive
command:
$ cd $HOME/corona
$ git archive --verbose --output $tempdir/corona.tar.gz HEAD
For the sake of this example, this tarball is the file you want to share with your users.
The self-extracting script's structure
The self-extracting script is split into the following sections:
- Code that helps users extract the data (the "payload")
- An anchor separating data (to be extracted) from the script
- The anchor position to extract the data that comes after it
Bash, as it turns out, is pretty good at defining a script this way.
[ Download the free Bash shell scripting cheat sheet. ]
Create the payload
Here is an idea: Say the data you need to distribute is a directory with many scripts and also data. You want to keep your permissions and structure intact, and you want the user to just "unpack" this into their home directory.
This sounds like a job for the tar
command. But, for the sake of argument, say your users don't know how to use tar
, or they want special options when installing the tarball file (like extracting only a specific file).
Another issue is that your .tar archive is a binary file. If you want to send it by email, you have to encode it properly with Uuencode or Base64 so that it can be transmitted safely.
What to do? Don't throw away the .tar file yet. Instead, prepare it so you can append it to a Bash script (which you'll write shortly):
$ base64 $tempdir/corona.tar.gz > $tempdir/corona_payload
$ file $tempdir/corona_payload
/tmp/tmp.8QNdzdKEkG/corona_payload: ASCII text
Extract data from a .tar file
You can either dump all of the contents into a new directory:
$ newbase=$HOME/coronadata
$ test ! -d $newbase && /bin/mkdir --parents --verbose $newbase
$ tar --directory $newbase \
--file corona.tar.gz --extract --gzip --verbose
Or you can extract just part of it, such as the measures, experiment, and test directories:
$ newbase=$HOME/coronadata
$ test ! -d $newbase && /bin/mkdir --parents --verbose $newbase
$ tar --directory $newbase --file corona.tar.gz \
--extract --gzip --verbose measures experiment test
For this exercise, extract the whole thing to a base directory (like $HOME
), so the result is:
$HOME/$COVIDUSERDIR
[ You might also be interested in More tips for packaging your Linux software with RPM. ]
Anatomy of the self-extracting script
Below is the code of my self-extracting script. You can save the script in your Git repository and reuse it for other deployments. Things to notice:
SCRIPT_END
is the position where the payload starts inside the script- It sanitizes user input
- Once you figure out the position of the metadata, extract it from the script (
$0
), decode it back to binary, and then unpack it.
#!/usr/bin/env bash
# Author: Jose Vicente Nunez
SCRIPT_END=$(/bin/grep --max-count 2 --line-number ___END_OF_SHELL_SCRIPT___ "$0"| /bin/cut --field 1 --delimiter :| /bin/tail -1)|| exit 100
((SCRIPT_END+=1))
basedir=
while test -z "$basedir"; do
read -r -p "Where do you want to extract the COVID-19 data, relative to $HOME? (example: mydata -> $HOME/mydata. Press CTRL-C to abort):" basedir
done
:<<DOC
Sanitize the user input. This is quite restrictive, so it depends of the real application requirements.
DOC
CLEAN=${basedir//_/}
CLEAN=${CLEAN// /_}
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
if [ ! -d "$HOME/$CLEAN" ]; then
echo "[INFO]: Will try to create the directory $HOME/$CLEAN"
if ! /bin/mkdir --parent --verbose "$HOME/$CLEAN"; then
echo "[ERROR]: Failed to create $HOME/$CLEAN"
exit 100
fi
fi
/bin/tail --lines +"$SCRIPT_END" "$0"| /bin/base64 -d| /bin/tar --file - --extract --gzip --directory "$HOME/$CLEAN"
exit 0
# Here's the end of the script followed by the embedded file
___END_OF_SHELL_SCRIPT___
So how do you add the payload to the script? Just put together the two pieces with a little bit of cat
glue:
$ cat covid_extract.sh \
$tempdir/corona_payload > covid_final_installer.sh
Make it executable:
$ chmod u+x covid_final_installer.sh
You can see how the installer combines with the payload. It's big because it contains the payload.
Run the installer
Does it work? Test it out for yourself:
$ ./covid_final_installer.sh
Where do you want to extract the COVID-19 data, relative to /home/josevnz? (example: mydata -> /home/josevnz/mydata. Press CTRL-C to abort):COVIDDATA
[INFO]: Will try to create the directory /home/josevnz/COVIDDATA
/bin/mkdir: created directory '/home/josevnz/COVIDDATA'
$ tree /home/josevnz/COVIDDATA
/home/josevnz/COVIDDATA
├── acaps_covid19_government_measures_dataset_0.xlsx
├── acaps_covid19_government_measures_dataset.xlsx
├── COVID-19-geographic-disbtribution-worldwide.xlsx
├── EUCDC.ipynb
├── experiment
...
Self-extracting installers are useful
I find self-extracting installers useful for many reasons.
First, you can make them as complicated or simple as you want them to be. The most complex part is dictating where the script should extract the payload.
And it's useful to know this technique because malware installers can use it, too. Now you're more prepared to spot code like this in a script. Just as importantly, you now know how to prevent shell injection misuse by validating user input in your own self-extracting scripts.
There are good tools out there to automate this. Give them a try (but check their code first).
Jose Vicente Nunez
Proud dad and husband, software developer and sysadmin. Recreational runner and geek. More about me