A long time ago in UNIX history, users on a server were actual UNIX users with entries in
/etc/shadow and an interactive login shell and a home directory. There were tools for admins to communicate with users, and to monitor their activity to avoid stupid or malicious mistakes that would cause server resources to be unfairly allocated.
These days, your userbase is less likely to have entries in
/etc/shadow, instead being managed by a layer of abstraction, whether it’s LDAP or Drupal or OpenShift. Then again, there are a lot more servers now, which means there are a lot more sysadmins logging in and out to perform maintenance. Where there’s activity, there’s opportunity for mistakes and confusion, so it’s time to dust off those old monitoring tools and put them to good use.
Here are some of the monitoring commands you may have forgotten about (or never knew about) to help you track what’s been happening on your server.
First, the basics.
who command is provided by the GNU coreutils package, and its primary job is to parse the
/var/log/utmp file and report its findings.
utmp file logs the current users on the system. It doesn’t necessarily show every process, because not all programs initiate
utmp logging. In fact, your system may not even have a
utmp file by default. In that case,
who falls back upon
/var/log/wtmp, which records all logins and logouts.
wtmp file format is exactly the same as
utmp, except that a null user name indicates a logout and the
~ character indicates a system shutdown or reboot. The
wtmp file is maintained by
init(1), and some versions of
getty(8), however, none of these applications creates the file, so if you remove
wtmp, then record-keeping is deactivated. That alone is good to know: if
wtmp is missing, you should find out why!
The output of
who --heading looks something like this:
NAME LINE TIME COMMENT seth tty2 2020-01-26 18:19 (tty2) larry pts/2 2020-01-28 13:02 (10.1.1.8) curly pts/3 2020-01-28 14:42 (10.1.1.5)
This shows you the username of each person logged in, the time their login was recorded, and their IP address.
who command also humbly provides the official POSIX way of discovering which user you are logged in as, but only if
$ who -m curly pts/3 2020-01-28 14:44 (10.1.1.8)
It also provides a mechanism to display the current runlevel:
$ who -r run-level 5 2020-01-26 23:58
For a little more context about users, the simple
w command provides a list of who’s logged in and what they’re doing. This information is displayed in a format similar to the output of
who, but the time the user has been idle, the CPU time used by all processes attached to the login TTY, and the CPU time used by just the current process. The user’s current process is listed in the final field.
$ w 13:45:48 up 29 days, 19:24, 2 users, load average: 0.53, 0.52, 0.54 USER TTY LOGIN@ IDLE JCPU PCPU WHAT seth tty2 Sun18 43:22m 0.01s 0.01s /usr/libexec/gnome-session-binary curly pts/2 13:02 35:12 0.03s 0.03s -bash
Alternatively, you can view the user’s IP address with the
You can narrow the output down to a single user name by specifying which user you want information about:
$ w seth 13:45:48 up 29 days, 19:27, 2 users, load average: 0.53, 0.52, 0.54 USER TTY LOGIN@ IDLE JCPU PCPU WHAT seth tty2 Sun18 43:25m 0.01s 0.01s /usr/libexec/gnome-session-binary
utmpdump utility does (almost) exactly what its name suggests: it dumps the contents of the
/var/log/utmp file to your screen. Actually, it dumps either the
utmp or the
wtmp file, depending on which you specify. Of course, the file you specify doesn’t have to be located in
/var/log or even named
wtmp, and it doesn’t even have to be in the right format. If you feed
utmpdump a text file, it dumps the contents to your screen (or a file, with the
--output option) in a format that’s predictable and easy to parse.
Normally, of course, you would just use
w to parse login records, but
utmpdump is useful in many instances.
- Files can get corrupted. While
ware often able to detect corruption themselves,
utmpdumpis ever more tolerant because it does no parsing on its own. It renders the raw data for you to deal with.
- Once you’ve repaired a corrupted file,
utmpdumpcan patch your changes back in.
- Sometimes you just want to parse data yourself. Maybe you’re looking for something that
waren’t programmed to look for, or maybe you’re trying to make correlations all your own.
Whatever the reason,
utmpdump is a useful tool to extract raw data from the login records.
If you have repaired a corrupted login log, you can use
utmpdump to write your changes back to the master log:
$ sudo utmpdump -r < wtmp.fix > /var/log/wtmp
Once you know who’s logged in on your system, you can use
ps to get a snapshot of current processes. This isn’t to be confused with the top, which displays a running report on current processes; this is a snapshot taken the moment
ps is issued, and then printed to your screen. There are advantages and disadvantages to both, so you can choose which to use based on your requirements. Because of its static nature,
ps is particularly useful for later analysis, or just as a nice manageable summary.
ps command is old and well-known, and it seems many admins have learned the old UNIX command rather than the latest implementation. The modern
ps (from the
procps-ng package) offers many helpful mnemonics, and it’s what ships on RHEL, CentOS, Fedora, and many other distributions, so it’s what this article uses.
You can get all processes being run by a single user with the
-u) option, along with the user name of who you want a report on. To give the output the added context of which process is the parent of a child process, use the
--forest option for a “tree” view:
$ ps --forst --user larry PID TTY TIME CMD 39707 ? 00:00:00 sshd 39713 pts/4 00:00:00 \_ bash 39684 ? 00:00:00 systemd 39691 ? 00:00:00 \_ (sd-pam)
For every process on the system:
$ ps --forest -e [...] 29284 ? 00:00:48 \_ gnome-terminal- 29423 pts/0 00:00:00 | \_ bash 42767 pts/0 00:00:00 | | \_ ps 39631 pts/1 00:00:00 | \_ bash 39671 pts/1 00:00:00 | \_ ssh 32604 ? 00:00:00 \_ bwrap 32612 ? 00:00:00 | \_ bwrap 32613 ? 00:09:05 | \_ dring 32609 ? 00:00:00 \_ bwrap 32610 ? 00:00:15 \_ xdg-dbus-proxy 1870 ? 00:00:05 gnome-keyring-d 4809 ? 00:00:00 \_ ssh-agent [...]
The default columns are useful, but you can change them to better suit what you’re researching. The
-o option gives you full control over which columns you see. For a full list of possible columns, refer to the Standard Format Specifiers section of the ps(1) man page.
$ ps -eo pid,user,pcpu,args --sort user 42799 root 0.0 [kworker/u16:7-flush-253:1] 42829 root 0.0 [kworker/0:2-events] 42985 root 0.0 [kworker/3:0-events_freezable_power_] 1181 rtkit 0.0 /usr/libexec/rtkit-daemon 1849 seth 0.0 /usr/lib/systemd/systemd --user 1857 seth 0.0 (sd-pam) 1870 seth 0.0 /usr/bin/gnome-keyring-daemon --daemonize --login 1879 seth 0.0 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session
ps command is very flexible. You can modify its output natively so you don’t have to rely on
awk to find what you care about. Craft a good
ps command, alias it to something memorable, and run it often. It’s one of the top ways to stay informed about what’s happening on your server.
Sometimes, you may have some idea of a problematic process and need to investigate it instead of your users or system. To do that, there’s the
pgrep command from the
At its most basic,
pgrep works like a grep on the output of
$ pgrep bash 29423 39631 39713
Instead of listing the PIDs, you can just get a count of how many PIDs would be returned:
$ pgrep --count bash 3
For more information, you can affect your search through processes by user name (
-u), terminal (
--terminal), and age (
--oldest), and more. To find a process belonging to a specific user, for example:
$ pgrep bash -u moe --list-name 39631 bash
You can even get inverse matches with the
pgrep is the
pkill command. It’s a lot like the
kill command, except that it uses the same options as
pgrep so you can send signals to a troublesome process using whatever information is easiest for you.
For example, if you have discovered that a process initiated by user
larry is monopolizing resources, and you know from
larry is located on terminal
pts/2, then you can kill the login session and all of its children with just the terminal name:
$ sudo pkill -9 --terminal pts/2
Or you can use just the user name to end all processes matching it:
$ sudo pkill -u larry
pkill is a good “panic” button or sledgehammer-style solution when a problem has gotten out of hand.
Just because a series of commands exist in a terminal doesn’t mean they’re necessarily better than other solutions. Take stock of your requirements and choose the best tool for what you need. Sometimes a graphical monitoring and reporting system is exactly what you need, and other times terminal commands that are easily scripted and parsed are the right answer. Choose wisely, learn your tools, and you’ll never be in the dark about what’s happening within your bare metal.
[Want to learn more about monitoring and security? Check out the IT security and compliance checklist. ]