[Freeipa-interest] Re: [Freeipa-devel] Feedback requested on Audit piece of IPA

Fri Jul 25 13:57:38 UTC 2008

Sorry I'm coming back to this so late...

On  17-Jul-2008, at 8:49 AM, Dmitri Pal wrote:
> Gunnar Hellekson wrote:
>> On  16-Jul-2008, at 8:19 PM, David O'Brien wrote:
>>> Karl Wirth wrote:
>>>> Currently we identified that audit system in general can be  
>>>> targeted to:
>>>> *Collect data from different sources
>>>> *Consolidate data into a combined storage
>>>> *Provide effective tools to analyze collected data
>>>> *Archive collected data including signing and compression
>>>> *Restore data from archives for audit purposes or analyses
>>>>
>>>> We need your feedback on a couple of questions:
>>>> 1) Should we store structured log data for analysis, original log  
>>>> data,
>>>> or both
>>>> - To do analysis of the log data, it would be better to structure  
>>>> it and
>>>> store it.
>>>> - But structured data is not the same as the original log file  
>>>> that it
>>>> was taken from.   Do we need the original log file format for  
>>>> reasons of
>>>> compliance or can we throw it away?
>>>> - Storing both parsed and unparsed data will have significant  
>>>> storage
>>>> impact.
>>>>
>>> I'm just a beginner but my first reaction here is How is this  
>>> going to affect a forensics situation? Shouldn't we always have  
>>> access to untouched/raw data? We can parse it and create whatever  
>>> structure is required on demand, but if we do it immediately and  
>>> trash the original data, there's no going back.
>>
>> That's right. The user should always have the option of keeping the  
>> raw data. Often, there are requirements to maintain that data on  
>> write-once media, etc. so I don't think they'd take kindly to  
>> summarily trashing it. It would be great it we could accommodate  
>> the more hard-core folks, or folks who'd like the raw data for  
>> third-party log-eating tools. I feel pretty strongly that we should  
>> at least have the option of maintaining the original log file  
>> format. We can then allow the raw logs to be managed via logrotate  
>> rules for retiring, compression, signing, etc. This may mean that  
>> they do not get touched at all, which is what some customers want.
>>
> But this means that we will have to store twice as much data. It  
> will be terabytes! Is this what customers want? I would require a  
> high end hardware to process these  logs if we want to provide any  
> kind of analysis and correlation. This will be a trade off. We can  
> collect raw data - not a big deal I just wanted to be sure that this  
> is really the case.

You're right. In fact, these customers already have big resource  
problems, but it's not something we want to fix. They're accustomed to  
this problem :)

>>>> 2) Should we parse the data into a structure format locally or  
>>>> back on
>>>> IPA server?
>>>> - Parsing locally and passing both parsed and original log data  
>>>> will
>>>> increase network traffic but reduce load on server
>>
>> The a priori "forensic expert" in me is suspicious of munging data  
>> on the client.  It seems as though we're solving a problem  
>> destructively, since we lose the ability to verify the original  
>> data. What happens if there's a bug in the parser? If we're  
>> supporting this, it should be optional.
> Ok we will provide an optional capability to preserve raw data.
> What about filtering? The problem with the filtering is that you  
> need to parse and sort out raw data. As a result you have raw data  
> and parsed out data. Then you apply filter to parsed data and decide  
> based upon the central policy if this event is of interest to you.  
> If it is not of interest you throw it away. Is it a valid use case  
> or in reality we need to collect everything and not filter anything?  
> If we collect everything there is no need to parse on the client and  
> thus there is only raw data to transfer to central location. We can  
> do parsing there . This approach saves processing time on client and  
> reduces network traffic but adds more burden to the server. We can  
> create different architectures and provide same set of features, the  
> question is more about which use case is primary. We should optimize  
> the system for the primary use case.
> I see two main use cases:
> a) Customer wants to preserve and collect original data untouched  
> without filtering and store it. Requirements to have capabilities to  
> search and analyze are secondary.
> b) Customer wants to  collect data for effective processing and  
> analysis. Filtering is crucial. Raw data is optional and not that  
> important.
> We are not talking about real time log monitoring and intrusion  
> detection. There is a separate product to do this (Prelude + IDS)  
> and we do not want to duplicate it.
> If we have to solve both use cases above we will seem to have worst  
> of both worlds: a lot of processing, a lot of data to transfer and  
> store. If we can select which use case is dominating we would be  
> able to tune up the design to solve it best. Is this possible or  
> these two use case are equal?

I'm not sure how much tuning we really need here -- I think we can  
solve all cases by having a filter on the client and filter on the  
server, and having both those filters optional and centrally managed.  
Even better, include a store-and-forward mechanism on the client, so  
we can elegantly handle the disconnected case, and I think everyone's  
happy.

>>>> 3) What is the scope of what should be included in the audit data  
>>>> in
>>>> addition to what we will get from syslog, rsyslog, auditd, etc.   
>>>> Those
>>>> will give us data like user access to a system, keystrokes, etc.   
>>>> What
>>>> beyond that is needed.  For example, is the following needed:  
>>>> Files user
>>>> accessed on a system
>>
>> Between a keystroke logger, syslog and auditd, that takes care of  
>> just about everything, including a log of the files a user accessed  
>> on a system.
>
> The problem is more about other platforms. On Linux we have auditd,  
> inotify and a lot of other nice things that would help. But how to  
> monitor file changes on Solaris, HP, AIX? We want to collect logs  
> from all kinds of machines. Do we need to worry about this and build  
> audit information collecting tools for those systems? Which tools a  
> priority? How far we need to go?

I'm not sure we have to introduce any additional features on clients  
we don't own. We should be able to consume what they have to offer,  
but we don't need to go implementing new features to homogenize. Is  
that what you mean?

g

-- 
Gunnar Hellekson, RHCE
Lead Architect, Red Hat Government