피드 구독

Sometimes, you need a quick and reliable way to distribute data or software to users without using a package manager (for example, the end user may not have root access to install an application).

This issue could be tackled by using containers and Podman or Docker, but what if they're not available on the destination system? What if one of the requirements is for the application to work on a bare-metal environment?

You could use Python with pip (and you probably know that you can package non-Python artifacts, too), but then you may be faced with some installation limitations (a virtual environment, or the --user option), not to mention that you need boilerplate code to package your Python code.

So is everything lost? Fear not! In this article, I demonstrate a very small but effective technique to write a self-extracting script that doesn't require elevated privileges.

Set up your data

Damiaan Zwietering has a cool Git repository about coronavirus with data and visualizations (Jupyter books and Excel spreadsheets), but no installer. Suppose you want to give this to your users, but they don't have access to Git. You can create a self-extracting installer for your users.

In real life, you'd already have data that you want to distribute. But so you have some sample data to work with, first clone this repository to your home directory:

$ git clone https://gitlab.com/dzwietering/corona.git

There's now a lot of data and a not-so-shallow directory structure, but you can create a .tar file using the git archive command:

$ cd $HOME/corona
$ git archive --verbose --output $tempdir/corona.tar.gz HEAD

For the sake of this example, this tarball is the file you want to share with your users.

The self-extracting script's structure

The self-extracting script is split into the following sections:

  1. Code that helps users extract the data (the "payload")
  2. An anchor separating data (to be extracted) from the script
  3. The anchor position to extract the data that comes after it

Bash, as it turns out, is pretty good at defining a script this way.

[ Download the free Bash shell scripting cheat sheet. ]

Create the payload

Here is an idea: Say the data you need to distribute is a directory with many scripts and also data. You want to keep your permissions and structure intact, and you want the user to just "unpack" this into their home directory.

This sounds like a job for the tar command. But, for the sake of argument, say your users don't know how to use tar, or they want special options when installing the tarball file (like extracting only a specific file).

Another issue is that your .tar archive is a binary file. If you want to send it by email, you have to encode it properly with Uuencode or Base64 so that it can be transmitted safely.

What to do? Don't throw away the .tar file yet. Instead, prepare it so you can append it to a Bash script (which you'll write shortly):

$ base64 $tempdir/corona.tar.gz > $tempdir/corona_payload
$ file $tempdir/corona_payload
/tmp/tmp.8QNdzdKEkG/corona_payload: ASCII text

Extract data from a .tar file

You can either dump all of the contents into a new directory:

$ newbase=$HOME/coronadata
$ test ! -d $newbase && /bin/mkdir --parents --verbose $newbase
$ tar --directory $newbase \
--file corona.tar.gz --extract --gzip --verbose

Or you can extract just part of it, such as the measures, experiment, and test directories:

$ newbase=$HOME/coronadata
$ test ! -d $newbase && /bin/mkdir --parents --verbose $newbase
$ tar --directory $newbase --file corona.tar.gz \
--extract --gzip --verbose measures experiment test

For this exercise, extract the whole thing to a base directory (like $HOME), so the result is:

$HOME/$COVIDUSERDIR

[ You might also be interested in More tips for packaging your Linux software with RPM. ]

Anatomy of the self-extracting script

Below is the code of my self-extracting script. You can save the script in your Git repository and reuse it for other deployments. Things to notice:

  • SCRIPT_END is the position where the payload starts inside the script
  • It sanitizes user input
  • Once you figure out the position of the metadata, extract it from the script ($0), decode it back to binary, and then unpack it.
#!/usr/bin/env bash
# Author: Jose Vicente Nunez
SCRIPT_END=$(/bin/grep --max-count 2 --line-number ___END_OF_SHELL_SCRIPT___ "$0"| /bin/cut --field 1 --delimiter :| /bin/tail -1)|| exit 100
((SCRIPT_END+=1))
basedir=
while test -z "$basedir"; do
    read -r -p "Where do you want to extract the COVID-19 data, relative to $HOME? (example: mydata -> $HOME/mydata. Press CTRL-C to abort):" basedir
done
:<<DOC
Sanitize the user input. This is quite restrictive, so it depends of the real application requirements.
DOC
CLEAN=${basedir//_/}
CLEAN=${CLEAN// /_}
CLEAN=${CLEAN//[^a-zA-Z0-9_]/}
if [ ! -d "$HOME/$CLEAN" ]; then
    echo "[INFO]: Will try to create the directory $HOME/$CLEAN"
    if ! /bin/mkdir --parent --verbose "$HOME/$CLEAN"; then
        echo "[ERROR]: Failed to create $HOME/$CLEAN"
        exit 100
    fi
fi

/bin/tail --lines +"$SCRIPT_END" "$0"| /bin/base64 -d| /bin/tar --file - --extract --gzip --directory "$HOME/$CLEAN"

exit 0
# Here's the end of the script followed by the embedded file
___END_OF_SHELL_SCRIPT___

So how do you add the payload to the script? Just put together the two pieces with a little bit of cat glue:

$ cat covid_extract.sh \
$tempdir/corona_payload > covid_final_installer.sh

Make it executable:

$ chmod u+x covid_final_installer.sh

You can see how the installer combines with the payload. It's big because it contains the payload.

Run the installer

Does it work? Test it out for yourself:

$ ./covid_final_installer.sh 
Where do you want to extract the COVID-19 data, relative to /home/josevnz? (example: mydata -> /home/josevnz/mydata. Press CTRL-C to abort):COVIDDATA

[INFO]: Will try to create the directory /home/josevnz/COVIDDATA

/bin/mkdir: created directory '/home/josevnz/COVIDDATA'

$ tree /home/josevnz/COVIDDATA
/home/josevnz/COVIDDATA
├── acaps_covid19_government_measures_dataset_0.xlsx
├── acaps_covid19_government_measures_dataset.xlsx
├── COVID-19-geographic-disbtribution-worldwide.xlsx
├── EUCDC.ipynb
├── experiment
...

Self-extracting installers are useful

I find self-extracting installers useful for many reasons.

First, you can make them as complicated or simple as you want them to be. The most complex part is dictating where the script should extract the payload.

And it's useful to know this technique because malware installers can use it, too. Now you're more prepared to spot code like this in a script. Just as importantly, you now know how to prevent shell injection misuse by validating user input in your own self-extracting scripts.

There are good tools out there to automate this. Give them a try (but check their code first).


저자 소개

Proud dad and husband, software developer and sysadmin. Recreational runner and geek.

Read full bio
UI_Icon-Red_Hat-Close-A-Black-RGB

채널별 검색

automation icon

오토메이션

기술, 팀, 인프라를 위한 IT 자동화 최신 동향

AI icon

인공지능

고객이 어디서나 AI 워크로드를 실행할 수 있도록 지원하는 플랫폼 업데이트

open hybrid cloud icon

오픈 하이브리드 클라우드

하이브리드 클라우드로 더욱 유연한 미래를 구축하는 방법을 알아보세요

security icon

보안

환경과 기술 전반에 걸쳐 리스크를 감소하는 방법에 대한 최신 정보

edge icon

엣지 컴퓨팅

엣지에서의 운영을 단순화하는 플랫폼 업데이트

Infrastructure icon

인프라

세계적으로 인정받은 기업용 Linux 플랫폼에 대한 최신 정보

application development icon

애플리케이션

복잡한 애플리케이션에 대한 솔루션 더 보기

Original series icon

오리지널 쇼

엔터프라이즈 기술 분야의 제작자와 리더가 전하는 흥미로운 스토리