[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: CPU spikes when migrating proprietary code from AS 2.1 to AS 3.0



On Thu, 2004-12-23 at 13:38 -0500, Sean Kirkpatrick wrote:
> Hello All,
> 
> We have observed CPU utilization spikes when porting our software from our current platform (Redhat AS 2.1) to our new development platform (Redhat AS 3.0). When running on the AS 2.1 multi-processor boxes, using kernel 2.4.9-e.3smp #1 SMP, everything behaves normally. However when running on AS 3.0 multi-processor boxes, using kernel 2.4.21-27.ELsmp #1 SMP or 2.4.21-20.ELsmp #1 SMP, we noticed that periodically the CPU utilization would spike to the point where one of the processors would be 100% consumed. This would occur whether hyper-threading was turned on or off. If we switch to the non SMP kernel (2.4.21-27.EL #1) there are no CPU spikes.
> 
> The attached program demonstrates the problem. It can be run as a daemon (default) or non-daemon (cmd opt -no-daemon). It will stat a non existent file 100 times every 10 milliseconds, using select to sleep. (I have also tried using nanosleep with the same results.) When run on the 2.4.21 SMP kernels it will quickly begin to accumulate CPU time, caused by these intermittent spikes as opposed to a steady build-up. When run on the 2.4.9 SMP kernel or the 2.4.21 NON - SMP kernel, there are no cpu spikes, and no accumulation of CPU time.
> 
> My guess is that the problem ultimately has to do with the frequent 10 ms sleeps. However I would like to know why it only occurs with the 2.4.21 SMP kernels and not the 2.4.9 or 2.4.21 non smp. I'm not sure if it has anything to do with it or not, however I have noticed that when running on 2.4.9 or 2.4.21 NON smp the test app retains a priority of 15. When running on the 2.4.21 SMP kernels it ends up at 25.
> 

note that the kernel internally samples the cpu usage at 10msec
frequencies, if your program has it's own frequency behavior at 10msec
you can get interference between the sampling and your code... showing
100% busy.

> 	
> 	// pid is zero surely meaning we are the child
> 	int i = open("/dev/null", O_RDWR );
>     dup2(i, 0);
>     dup2(i, 1);
>     dup2(i, 2);
>     close(i);

btw, you do know about the daemon() function call that does a lot of
this work for you, right?


Attachment: signature.asc
Description: This is a digitally signed message part


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]