[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Freeipa-interest] Re: [Freeipa-devel] Feedback requested on Audit piece of IPA



Gunnar Hellekson wrote:
On  16-Jul-2008, at 8:19 PM, David O'Brien wrote:
Karl Wirth wrote:
Currently we identified that audit system in general can be targeted to:
*Collect data from different sources
*Consolidate data into a combined storage
*Provide effective tools to analyze collected data
*Archive collected data including signing and compression
*Restore data from archives for audit purposes or analyses

We need your feedback on a couple of questions:
1) Should we store structured log data for analysis, original log data,
or both
- To do analysis of the log data, it would be better to structure it and
store it.
- But structured data is not the same as the original log file that it
was taken from. Do we need the original log file format for reasons of
compliance or can we throw it away?
- Storing both parsed and unparsed data will have significant storage
impact.

I'm just a beginner but my first reaction here is How is this going to affect a forensics situation? Shouldn't we always have access to untouched/raw data? We can parse it and create whatever structure is required on demand, but if we do it immediately and trash the original data, there's no going back.

That's right. The user should always have the option of keeping the raw data. Often, there are requirements to maintain that data on write-once media, etc. so I don't think they'd take kindly to summarily trashing it. It would be great it we could accommodate the more hard-core folks, or folks who'd like the raw data for third-party log-eating tools. I feel pretty strongly that we should at least have the option of maintaining the original log file format. We can then allow the raw logs to be managed via logrotate rules for retiring, compression, signing, etc. This may mean that they do not get touched at all, which is what some customers want.

But this means that we will have to store twice as much data. It will be terabytes! Is this what customers want? I would require a high end hardware to process these logs if we want to provide any kind of analysis and correlation. This will be a trade off. We can collect raw data - not a big deal I just wanted to be sure that this is really the case.
2) Should we parse the data into a structure format locally or back on
IPA server?
- Parsing locally and passing both parsed and original log data will
increase network traffic but reduce load on server

The a priori "forensic expert" in me is suspicious of munging data on the client. It seems as though we're solving a problem destructively, since we lose the ability to verify the original data. What happens if there's a bug in the parser? If we're supporting this, it should be optional.
Ok we will provide an optional capability to preserve raw data.
What about filtering? The problem with the filtering is that you need to parse and sort out raw data. As a result you have raw data and parsed out data. Then you apply filter to parsed data and decide based upon the central policy if this event is of interest to you. If it is not of interest you throw it away. Is it a valid use case or in reality we need to collect everything and not filter anything? If we collect everything there is no need to parse on the client and thus there is only raw data to transfer to central location. We can do parsing there . This approach saves processing time on client and reduces network traffic but adds more burden to the server. We can create different architectures and provide same set of features, the question is more about which use case is primary. We should optimize the system for the primary use case.
I see two main use cases:
a) Customer wants to preserve and collect original data untouched without filtering and store it. Requirements to have capabilities to search and analyze are secondary. b) Customer wants to collect data for effective processing and analysis. Filtering is crucial. Raw data is optional and not that important. We are not talking about real time log monitoring and intrusion detection. There is a separate product to do this (Prelude + IDS) and we do not want to duplicate it. If we have to solve both use cases above we will seem to have worst of both worlds: a lot of processing, a lot of data to transfer and store. If we can select which use case is dominating we would be able to tune up the design to solve it best. Is this possible or these two use case are equal?


3) What is the scope of what should be included in the audit data in
addition to what we will get from syslog, rsyslog, auditd, etc.  Those
will give us data like user access to a system, keystrokes, etc.  What
beyond that is needed. For example, is the following needed: Files user
accessed on a system

Between a keystroke logger, syslog and auditd, that takes care of just about everything, including a log of the files a user accessed on a system.

The problem is more about other platforms. On Linux we have auditd, inotify and a lot of other nice things that would help. But how to monitor file changes on Solaris, HP, AIX? We want to collect logs from all kinds of machines. Do we need to worry about this and build audit information collecting tools for those systems? Which tools a priority? How far we need to go?

Dmitri


g

--Gunnar Hellekson, RHCE
Lead Architect, Red Hat Government




_______________________________________________
Freeipa-interest mailing list
Freeipa-interest redhat com
https://www.redhat.com/mailman/listinfo/freeipa-interest


--
Dmitri Pal
Engineering Manager
Red Hat Inc.
[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]