Issue #5 March 2005

The security dilemma, part 1: Intrusion detection

Introduction

What do you do when your Linux system has been compromised? Where do you start? Whom do you call? These questions and more can send us into a flat spin when we find that we've been cracked. Often, our first reaction is "No way. Not me." Yes, denial. Followed by panic. "Is my data gone? Have I been kitted? Who is this person?" The first article of this two part series focuses on what to do and not to do, what to think about, and where to begin when you find someone's gained unauthorized access to your Linux system.

Before you can realize that your system has been compromised, you must know how to determine the symptoms.

Characterize the symptoms

Before you can realize that your system has been compromised, you must know how to determine the symptoms. Often, the symptoms are obvious. "I can't get into the database server." "The site is down." Sometimes, however, the symptoms are not so obvious. There are times when we don't even know that symptoms of unauthorized access are symptoms at all.

Clifford Stoll's experience with a cracker, detailed in The Cuckoo's Egg (Doubleday; 1989), tells us that a small bit of seemingly unrelated information can be our only clue. In Stoll's case, the trail was long and arduous, and the attacker had a sophisticated knowledge of UNIX. This is not always the case and fortunately (or maybe unfortunately) attacks are rarely as sophisticated this days.

Modern attacks are often perpetrated by script kiddies, unskilled twirps who think they are achieving something ingenious when making a mess of a system or sometimes doing very little at all. These days, if we do encounter a skilled attacker, our system may be a stepping stone to something larger or a pawn in a distributed denial-of-service (DOS) attack of some kind. The good news is that such attacks tend to generate forensic patterns. For instance, an attacker looking for alternate access may run SSH on a non-standard port and may modify firewall or service configurations. Monitoring can expose these changes so awareness is important.

Monitor for Awareness

Monitoring is important for any well-run site and particularly important after a break-in. Still, monitoring a large number of servers for security can be a daunting task. Monitoring common areas of attack is a good practice that can and should be organic to everyday system administration.

Since the last thing you want to do after an intrusion is stay up all night watching the system, automation is essential, and it will make your job much easier. As we discussed in the section called “Characterize the symptoms”, typical attacks generate forensic patterns. It's the output of your automated monitoring that reveals these patterns, and because most attacks originate from network services, the logs associated with these services are the best place to start.

Typical Linux installs include tools like logwatch, which aggregate logs for easy monitoring. You can't be everywhere at all times, but you can automate data collection and send it to your mail queue or to your pager if necessary. Additionally, Unix commands like sar and other performance and system monitoring tools are naturals for security monitoring when used creatively. Make your monitoring innovative. The same old answer never solves a new problem.

Identify the target

Once you've characterized the symptoms of the attack, it's much easier to identify the target. For example, if your attacker modified services but didn't access a database on the same system, it's easier to conclude that the database wasn't the target of the attack.

Properly identifying the target also provides you with guidance as to what actions to take in cleaning up and securing your system.

Properly identifying the target also provides you with guidance as to what actions to take in cleaning up and securing your system. There's no need to spend hours measuring potential performance degradation after an intrusion if you can't positively identify that area as a target. Focus is key in evaluating your system for risk and damage.

After an intrusion, there is always a potential for after-attacks, i.e. booby traps. When an attack occurs, the first 14 days after the intrusion are a critical time to observe the behavior of the system. Look for variations in system utilization. A machine running excessively high CPU, memory, or network throughput should tip you off. More subtle changes such as increased I/O wait or network latency are also classic signs that something has changed. If you routinely spend hours logged in to a system, you may have a good feel for the performance of the box from a hands on perspective.

The attacker's choice of target can also give us insight into what other problems we may see while attempting to fix/rehabilitate the system or whether this attack fits current trends. As I mentioned, there exists a potential for booby traps, and these can be telegraphed by a change in the system or user environments. Check initial programs and login shells for all users when cleaning up an attack. Also pay special attention to what may be queued up in cron or at jobs. An attacker can leave a system without doing much damage, but leaving a malicious at job can destroy data.

Another key area in evaluating your system is considering the extent of the damage. If the attacker's intention was to damage your system, you may find that it's easier, less time consuming, and more secure to simply rebuild your system from backups. Carrying on with a somewhat compromised system can leave weaknesses open for future attacks. Consider the cost of downtime versus future attacks.

Yet another area of focus, and a very important one, is what can be done about the damage to your system. Specifically, when an intrusion takes place, system administrators can get angry and spend disproportionate amounts of time rigging elaborate "revenge-based" traps for the attacker. The truth is that these days, attackers rarely return, and if they do, any hint that they're being watched or have been caught will cause them to disappear forever. This may sound like it's what you want, but in certain cases, the risk of not catching the intruder can exceed the cost of lost data. To evaluate this risk, you want to calculate your "risk of action."

Your risk of action is the relative risk of closing off the system before your intruder is caught.

Evaluate the risk of action

Once you've identified the target of the attack, you're likely to want to close off the system entirely until you get everything buttoned up. But there remains the question, what is my "risk of action?" Your risk of action is the relative risk of closing off the system before your intruder is caught. For the average Linux-at-home tinkerer, the answer is easy, close the door. But in a variety of situations, the answer to this question shifts as the value of data shifts. However, you can reduce your margin for error by calculating your risk of action (rA).

rA is a mathematical comparison of the value of your data loss versus the risk of future intrusions. rA is calculated with a simple equation:

rA = rI - vD

That is, the risk of taking action (rA) equals the risk of intrusion (rI) minus the value of your data (vD). Consider the following fictitious scenario:

An intrusion takes place on a system used by the US government for devising encryption algorithms. The owners of the system have concluded that the risk of not catching the intruder is higher than the value of the data because while the data itself is widely available throughout the academic community, the possession by certain persons outside the US could lead to the development of encryption that we are unable to break. In this case, rA is positive. When rA is positive, the best course of action is to leave the system open and hope that the intruder comes back so you can collect more information about him/her/them.

Now you may be saying to yourself, that's honey-pot. So what? The difference is that using rA can help you to arrive at an informed, well-thought-out decision about whether or not to leave your system susceptible to intruders while monitoring in a non-obtrusive way. You would want to mitigate your risk of having to rebuild the system by devaluing your data as much as possible. Clifford Stoll did something similar when he set up completely bogus information for his intruder to steal and in which the intruder took an interest.

Conversely, if rA is negative, you want to shut the system as soon as possible. For example, if an intrusion takes place at a pharmaceutical company, there may not be much doubt as to who would benefit from having the data. The data itself is highly valued. Therefore, a system owner may conclude that the value of their data always exceeds the risk of intrusion. In this case, rA is negative and the owner can confidently shut all doors to the system in the knowledge that he/she may be preventing further data loss even at the cost of downtime.

There are, of course, rA grey areas. Recent news about cracked (or socially engineered) systems containing consumer credit information indicate that although the value of data is very high, the political price of intrusion is equally high. In such cases, the rA equation doesn't help much. In part two of this article, we'll discuss pro-actively evaluating a system's rA rating.

Consider at this point, the Heisenberg's Uncertainty Principle. Uncertainty applies certainly when an intruder is being monitored because memory, processes, and files can change so rapidly that recording changes accurately is not possible without dramatically disturbing the operation of a system. For this reason, it's useful to follow The Order of Volatility (OOV). In Forensic Discovery (Farmer & Venema, 2005), the authors present OOV or the expected lifespan of data. Following OOV when investigating an intrusion gives a greater chance of preserving the details that data collection itself can destroy. It allows you to capture data about the incident in question rather than the side effects of your data gathering. Refer to Table 1, “The Order of Volatility”.

Type of data Life span
Registers, peripheral memory, caches, etc Nanoseconds or less
Main memory Ten nanoseconds
Network state Milliseconds
Running processes Seconds
Disk Minutes
Floppies, backup media, etc. Years
CD-ROMs, printouts, etc. Tens of years
Table 1. The Order of Volatility

System weakness

If you've characterized the symptoms of an attack, positively identified the target, and evaluated your risk of action, you have determined the area of weakness. Weakness doesn't refer specifically to a certain port or vulnerability, but rather to a characteristic method of gaining entry. A general point of entry, if you will.

Weakness doesn't refer specifically to a certain port or vulnerability, but rather to a characteristic method of gaining entry.

These points of entry are usually concurrent conditions which precipitated a given system's vulnerability. For example, a combination of vulnerabilities in MTA and firewall software. By classifying points of entry, we can identify trends and areas of vulnerability within systems. This may not help us much right after an attack, but it prepares us to effectively evaluate systems for the future. Our weaknesses become our strengths.

Conclusion

System administrators spend a significant part of each day monitoring their systems, installing security updates, and teaching users security prevention techniques. Their goal is to prevent intrusion from ever happening. But, if it does, awareness and focus are key to the recovery process.

You should strive to determine not only what happened, but when, how, and why your system was cracked. This knowledge gives you power and insight as you seek to prevent further intrusions. Although you may be "injured and wiser," you don't want there to be a "next time." Recovery should never be revenge but rather an extension of our own skills as system administrators.

Next month, part two of this series will focus on taking pro-active steps toward preventing intrusion.

Further reading

  • The Cuckoo's Egg by Clifford Stoll (Doubleday, 1989)
  • Forensic Discovery by Dan Farmer & Wietse Venema (Addison Wesley, 2005)
Editor's note:
For more articles about security, don't miss next month's issue of Red Hat Magazine. It will include the second part of this article and feature additional articles about system security such as one on implementing SELinux on Enteprise Linux 4.

About the author

Matt Frye is a Unix/Linux system administrator living in North Carolina. He is Chairman of the North Carolina System Administrators and is an active member of the Triangle Linux User Group. In his spare time, he enjoys fly fishing and mental Kung Foo.