ASCII representations of audit events

John D. Ramsdell ramsdell at mitre.org
Sat Mar 18 16:47:38 UTC 2006


Let me offer a design principle for tools that analyze audit logs, and
report their results by displaying audit records.  Irrespective of the
contents of the audit log, these tools should generate a 7-bit ASCII
representation of each audit record.

Consider the poor guy accessing a computer with a terminal.  If an
audit record contains binary data, and the person performs a query
using an audit tool, binary data in the answer could contain an escape
sequence that puts the terminal into a bazaar mode.  This happens to
me when I connect to a Linux machine using putty, and read mail that
contains Chinese characters.  Damn spam!

Binary data can occur in logs for unexpected reasons.  For example, a
log file can become corrupted, or something that is not a log file can
accidentally be used as one.  Furthermore, someone with bad intentions
can carefully add binary data designed to use terminal escapes to hide
their tracks.

Once one is carefully quoting field values, it becomes easy to offer
multiple formats.  Let me propose two ASCII representations of audit
events, one that is very similar to what is produced by ausearch, and
a scripting language friendly version, in which each audit record is a
sequence of tab separated values.

In both formats, an audit event is started by a line of text with
three hyphen characters.  In the tab separated values format, the
names and the values that make up a record are separated by a tab
character.  Each name or value is quoted using the C string literal
syntax.  Letters, digits, and space characters are formatted
unmodified.  Characters that can be represented with character
escapes, such as the tab and newline characters, are formatted using a
character escape, with the exception of apostrophe and question mark,
which is formatted unmodified.  Also formatted unmodified are the
graphics characters: !#%^&*(_)-+=~[]|;:{},.<>/.  The remaining
characters are formatted using three digit octal numeric escapes.

In the ausearch-like format, each name is separated from its value
with an equal sign, and name-value pairs are separated by a space
character.  A name or a value is formated unmodified if it contains
only characters that are formatted unmodifed in tab separated value
format, and do not contain an equal sign or a space character.
Otherwise, it is formated as in tab separated value format surrounded
by double quotes.

A name or value in tab separated value format is designed to be
scripting language friendly.  For example in Python, if the variable
item contains a value, and it has a back slash, one obtains the binary
string it represents with the Python expression

    eval('"' + item + '"', {}, {}).

Audit events represented as tab separated values are easily consumed
in Python.  A simple loop does the job.

def filter():
    seq = None      # A sequence of tables representing an audit event
    lineno = 0
    seqno = 0
    while True:
        line = sys.stdin.readline()
        if not line:
	    if seq:
		consume(seq, seqno)
            return
        lineno = lineno + 1
	if line == "---\n":
	    if seq:
		consume(seq, seqno)
            seq = []
            seqno = lineno
	    continue
        record = line.strip().split("\t")
        nf = len(record)                # number of fields
	if nf % 2 != 0:
	    sys.stderr.write("Bad field count on line " + str(lineno) + "\n")
	    sys.exit(1)
	tab = {}
        for i in range(0, nf, 2):
	    tab[record[i]] = record[i + 1]
        seq.append(tab)

C applications can easily generate both formats if they use the
following interface to generate their output.

#if !defined EMIT_H
#define EMIT_H

/* The emitters generate tab separated values when the flag is
   non-zero, otherwise name-value pairs are separated by an equal
   sign. */

void set_tsv_mode(int flag);

/* Emit an event start marker, the string "---\n". */

void emit_start_event(void);

/* Emit an end of record marker, a newline character. */

void emit_record_end(void);

/* Emit the field separator, a tab character when in TSV mode,
   otherwise a space character. */

void emit_field_separator(void);

/* Emit the name-value pair separator, a tab character when in TSV
   mode, otherwise an equal sign character. */

void emit_name_value_separator(void);

/* Emit a name or a value.  In TSV mode, the output is quoted using
   the C string literal syntax.  Letters, digits, and space characters
   are emitted unmodified.  Characters that can be represented with
   character escapes, such as the tab and newline characters, are
   printed using a character escape, with the exception of apostrophe
   and question mark, which are emitted unmodified.  Also emitted
   unmodified are the graphics characters: !#%^&*(_)-+=~[]|;:{},.<>/.
   The remaining characters are output using three digit octal numeric
   escapes.

   In non-TSV mode, a name or a value is emitted unmodified if it
   contains only characters that are emitted unmodifed in TSV mode,
   and do not contain an equal sign or a space character.  Otherwise,
   it is emitted as in TSV mode surrounded by double quotes.

   A name or value emitted in TSV mode is designed to be scripting
   language friendly.  For example in Python, if the variable item
   contains a value, and it has a back slash, one obtains the string
   it represents with the expression eval('"' + item + '"', {}, {}). */

void emit_item(const char *bytes);

#endif

The file emit.c that implements this interfaces is available in the
polgen CVS repository on SourceForge.

John

Those are my principles. If you don't like them, I have others. 
                                                      -- Groucho Marx 




More information about the Linux-audit mailing list