[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: aggregation/viewer question

LC Bruzenak wrote:
Has anyone been thinking about how to store/maintain the aggregated
audit data long-term?

In my setup, I will be sending data from several machines to one central
log host.

After a while, the number of logs/data will grow large. With hundreds of
files, the rotate will take more time and the audit-viewer "select
source" option becomes tedious. Most of my searches involve
time/host/user. Using the prelude plugin helps a lot, because it
highlights what is otherwise hidden in the data pool. But pulling out
that record from a selection of log files isn't currently intuitive.

I would think we'd put these into a RDB or structure them by time
directory structure something like year/month/week ... or maybe
something else entirely. I'm thinking also about ease of backup/restore
with incoming records. I'd hate to shut down all the sending clients
just to backup or restore my audit data, so that part will need to
operate asynchronously.

Before striking out on my own I thought I'd ask the list and see if
there are any such plans already in the works.

Yes, we plan on addressing many of these issues in IPA, not just for kernel audit data, but for all log data (e.g. Apache error log, Kerberos access log, SMTP logs, etc.). The basic idea is that there is will be a central server which accepts log data from individual nodes. The log data can be signed for authenticity and will be robustly transported via AMQP with fail over and guaranteed delivery. The log data will be compressed. You can specify which logs you want collected, their collection interval, along with record level filtering. Once on the server the log meta data is entered into a "catalogue" (a relational database) which along with the meta data stores where the actual log data can be found on disk. The disk files will be optimized for compression and access. The catalogue manager will be able to reconstruct any portion of a log file (stream) from any node within a time interval. This can be used for external analysis tools, compliance reporting etc. The catalogue will be capable of intelligently archiving old log data and restoring it back into a "live catalogue". This is what is planned for v2 of IPA, which is anticipated to be about 1 year from now. In v3 of IPA the audit catalogue will support search and reporting on *all* the log data in the catalogue (not just audit.log but all log data). In v3 when data arrives at the catalogue it will be indexed for fast search and retrieval. Search will be based on tokens and key/value pairs and will accept constraints on nodes, time intervals, users, etc. (Note a relational database will NOT be used to support searching, rather searches will be performed via optimized reverse indexes on textural tokens, the use of an RDB will only be for managing the collection of log files)

A note about vocabulary: in "IPA land" when we say "audit data" or an "audit catalogue" or "audit search" the term "audit" refers to any log data, of which kernel audit data is just one subset.
As a suggestion, the prewikka viewer seems like a workable model. I
realize that viewer is built around the IDS structure, but as an event
search tool it is pretty good and mostly complete. Having network access
to it is also a nice feature.

So right now I think that feeding the events into a DB and then using a
tool with the same capabilities as are in the prewikka viewer would be a
viable option. Others? Ideas?

Thanks in advance,

John Dennis <jdennis redhat com>

[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]