An introduction to Linux user account monitoring
A long time ago in UNIX history, users on a server were actual UNIX users with entries in /etc/shadow
and an interactive login shell and a home directory. There were tools for admins to communicate with users, and to monitor their activity to avoid stupid or malicious mistakes that would cause server resources to be unfairly allocated.
These days, your userbase is less likely to have entries in /etc/shadow
, instead being managed by a layer of abstraction, whether it’s LDAP or Drupal or OpenShift. Then again, there are a lot more servers now, which means there are a lot more sysadmins logging in and out to perform maintenance. Where there’s activity, there’s opportunity for mistakes and confusion, so it’s time to dust off those old monitoring tools and put them to good use.
Here are some of the monitoring commands you may have forgotten about (or never knew about) to help you track what’s been happening on your server.
who
First, the basics.
The who
command is provided by the GNU coreutils package, and its primary job is to parse the /var/log/utmp
file and report its findings.
The utmp
file logs the current users on the system. It doesn’t necessarily show every process, because not all programs initiate utmp
logging. In fact, your system may not even have a utmp
file by default. In that case, who
falls back upon /var/log/wtmp
, which records all logins and logouts.
The wtmp
file format is exactly the same as utmp
, except that a null user name indicates a logout and the ~
character indicates a system shutdown or reboot. The wtmp
file is maintained by login(1)
, init(1)
, and some versions of getty(8)
, however, none of these applications creates the file, so if you remove wtmp
, then record-keeping is deactivated. That alone is good to know: if wtmp
is missing, you should find out why!
The output of who --heading
looks something like this:
NAME LINE TIME COMMENT
seth tty2 2020-01-26 18:19 (tty2)
larry pts/2 2020-01-28 13:02 (10.1.1.8)
curly pts/3 2020-01-28 14:42 (10.1.1.5)
This shows you the username of each person logged in, the time their login was recorded, and their IP address.
The who
command also humbly provides the official POSIX way of discovering which user you are logged in as, but only if utmp
exists:
$ who -m
curly pts/3 2020-01-28 14:44 (10.1.1.8)
It also provides a mechanism to display the current runlevel:
$ who -r
run-level 5 2020-01-26 23:58
w
For a little more context about users, the simple w
command provides a list of who’s logged in and what they’re doing. This information is displayed in a format similar to the output of who
, but the time the user has been idle, the CPU time used by all processes attached to the login TTY, and the CPU time used by just the current process. The user’s current process is listed in the final field.
Sample output:
$ w
13:45:48 up 29 days, 19:24, 2 users, load average: 0.53, 0.52, 0.54
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
seth tty2 Sun18 43:22m 0.01s 0.01s /usr/libexec/gnome-session-binary
curly pts/2 13:02 35:12 0.03s 0.03s -bash
Alternatively, you can view the user’s IP address with the -i
or --ip-addr
option.
You can narrow the output down to a single user name by specifying which user you want information about:
$ w seth
13:45:48 up 29 days, 19:27, 2 users, load average: 0.53, 0.52, 0.54
USER TTY LOGIN@ IDLE JCPU PCPU WHAT
seth tty2 Sun18 43:25m 0.01s 0.01s /usr/libexec/gnome-session-binary
utmpdump
The utmpdump
utility does (almost) exactly what its name suggests: it dumps the contents of the /var/log/utmp
file to your screen. Actually, it dumps either the utmp
or the wtmp
file, depending on which you specify. Of course, the file you specify doesn’t have to be located in /var/log
or even named utmp
or wtmp
, and it doesn’t even have to be in the right format. If you feed utmpdump
a text file, it dumps the contents to your screen (or a file, with the --output
option) in a format that’s predictable and easy to parse.
Normally, of course, you would just use who
or w
to parse login records, but utmpdump
is useful in many instances.
- Files can get corrupted. While
who
andw
are often able to detect corruption themselves,utmpdump
is ever more tolerant because it does no parsing on its own. It renders the raw data for you to deal with. - Once you’ve repaired a corrupted file,
utmpdump
can patch your changes back in. - Sometimes you just want to parse data yourself. Maybe you’re looking for something that
who
andw
aren’t programmed to look for, or maybe you’re trying to make correlations all your own.
Whatever the reason, utmpdump
is a useful tool to extract raw data from the login records.
If you have repaired a corrupted login log, you can use utmpdump
to write your changes back to the master log:
$ sudo utmpdump -r < wtmp.fix > /var/log/wtmp
ps
Once you know who’s logged in on your system, you can use ps
to get a snapshot of current processes. This isn’t to be confused with the top, which displays a running report on current processes; this is a snapshot taken the moment ps
is issued, and then printed to your screen. There are advantages and disadvantages to both, so you can choose which to use based on your requirements. Because of its static nature, ps
is particularly useful for later analysis, or just as a nice manageable summary.
The ps
command is old and well-known, and it seems many admins have learned the old UNIX command rather than the latest implementation. The modern ps
(from the procps-ng
package) offers many helpful mnemonics, and it’s what ships on RHEL, CentOS, Fedora, and many other distributions, so it’s what this article uses.
You can get all processes being run by a single user with the --user
(or -u
) option, along with the user name of who you want a report on. To give the output the added context of which process is the parent of a child process, use the --forest
option for a “tree” view:
$ ps --forst --user larry
PID TTY TIME CMD
39707 ? 00:00:00 sshd
39713 pts/4 00:00:00 \_ bash
39684 ? 00:00:00 systemd
39691 ? 00:00:00 \_ (sd-pam)
For every process on the system:
$ ps --forest -e
[...]
29284 ? 00:00:48 \_ gnome-terminal-
29423 pts/0 00:00:00 | \_ bash
42767 pts/0 00:00:00 | | \_ ps
39631 pts/1 00:00:00 | \_ bash
39671 pts/1 00:00:00 | \_ ssh
32604 ? 00:00:00 \_ bwrap
32612 ? 00:00:00 | \_ bwrap
32613 ? 00:09:05 | \_ dring
32609 ? 00:00:00 \_ bwrap
32610 ? 00:00:15 \_ xdg-dbus-proxy
1870 ? 00:00:05 gnome-keyring-d
4809 ? 00:00:00 \_ ssh-agent
[...]
The default columns are useful, but you can change them to better suit what you’re researching. The -o
option gives you full control over which columns you see. For a full list of possible columns, refer to the Standard Format Specifiers section of the ps(1) man page.
$ ps -eo pid,user,pcpu,args --sort user
42799 root 0.0 [kworker/u16:7-flush-253:1]
42829 root 0.0 [kworker/0:2-events]
42985 root 0.0 [kworker/3:0-events_freezable_power_]
1181 rtkit 0.0 /usr/libexec/rtkit-daemon
1849 seth 0.0 /usr/lib/systemd/systemd --user
1857 seth 0.0 (sd-pam)
1870 seth 0.0 /usr/bin/gnome-keyring-daemon --daemonize --login
1879 seth 0.0 /usr/libexec/gdm-wayland-session /usr/bin/gnome-session
The ps
command is very flexible. You can modify its output natively so you don’t have to rely on grep
and awk
to find what you care about. Craft a good ps
command, alias it to something memorable, and run it often. It’s one of the top ways to stay informed about what’s happening on your server.
pgrep
Sometimes, you may have some idea of a problematic process and need to investigate it instead of your users or system. To do that, there’s the pgrep
command from the psproc-ng
package.
At its most basic, pgrep
works like a grep on the output of ps
:
$ pgrep bash
29423
39631
39713
Instead of listing the PIDs, you can just get a count of how many PIDs would be returned:
$ pgrep --count bash
3
For more information, you can affect your search through processes by user name (-u
), terminal (--terminal
), and age (--newest
and --oldest
), and more. To find a process belonging to a specific user, for example:
$ pgrep bash -u moe --list-name
39631 bash
You can even get inverse matches with the --inverse
option.
pkill
Related to pgrep
is the pkill
command. It’s a lot like the kill
command, except that it uses the same options as pgrep
so you can send signals to a troublesome process using whatever information is easiest for you.
For example, if you have discovered that a process initiated by user larry
is monopolizing resources, and you know from w
that larry
is located on terminal pts/2
, then you can kill the login session and all of its children with just the terminal name:
$ sudo pkill -9 --terminal pts/2
Or you can use just the user name to end all processes matching it:
$ sudo pkill -u larry
Used judiciously, pkill
is a good “panic” button or sledgehammer-style solution when a problem has gotten out of hand.
Terminal monitoring
Just because a series of commands exist in a terminal doesn’t mean they’re necessarily better than other solutions. Take stock of your requirements and choose the best tool for what you need. Sometimes a graphical monitoring and reporting system is exactly what you need, and other times terminal commands that are easily scripted and parsed are the right answer. Choose wisely, learn your tools, and you’ll never be in the dark about what’s happening within your bare metal.
[Want to learn more about monitoring and security? Check out the IT security and compliance checklist. ]
Seth Kenlon
Seth Kenlon is a UNIX geek and free software enthusiast. More about me