EDAC error

Roger Heflin rogerheflin at gmail.com
Fri Mar 21 02:58:25 UTC 2008


Brent Snow, Mr. wrote:
> Hi All,
> 
>  
> 
>             I am having a problem with a new Dell PowerEdge 1900 Server
> running Fedora 8.
> 
>  
> 
>             The System setup is as follows:
> 
>  
> 
>             2 - Xeon  E5310 (Quad-Core 1.6 GHz) processors
> 
>  
> 
>             16 GB of RAM, I SATA 80 GB HDD. 
> 
>  
> 
>             ------------------------------------------------------
> 
>  
> 
>             The Error is as follows: EDAC i5000 MC0: nonfatal errors
> found 0=800.
> 

Is that the only error that you are getting?   If edac is detecting enough 
memory errors to slow a machine down, you should have enormous numbers of edac 
errors in either dmesg or the messages file.

>  
> 
>             The system runs very very slow (I have a p3 that is faster
> then this system is).
> 
>  
> 
>             I have installed Windows 2003 Server X_64 and it runs very
> very quick. 
> 
>  
> 
>             There are no errors under Windows, and there are no errors
> reported by Dell's diagnostic tools.
> 
>  
> 
>             I have run Memtest86+ (for 96 hours) and there are no errors
> detected there as well.
> 

Does the memtest program you are running actually register ECC errors for the 
I5000 chipset?   And is the ECC monitoring feature in memtest86 actually turned 
on?  It will show up in the menus, if it does not, then it is not monitoring the 
ECC errors, and is useless to debug this issue.  If it does not actually read 
those errors then you could be getting errors all over the place and the 
hardware ECC would correct it and memtest would be think everything was ok-I 
have seen this more than once.

>  
> 
>             As soon as I install Fedora 8, the errors show back up and
> the system just bogs down. 
> 
>  
> 
>             I have tried aliasing the EDAC files thinking that this may
> be the problem, but all that did was stop the log messages. 

If edac was causing the problem, and you don't actually have bad memory, then 
you would need to remove the edac module and/or turn it off to stop the errors, 
but I have seen new ram so bad that it gave persistent errors on every access 
(this was the 2nd rev of memory from the company for a certain MB-the 1st rev 
crashed under load within a very short time), and this memory was going to be 
passed by the MB vendor (not DELL) because they were using memtest86 and it 
ignored the actual errors that they were getting since the HW corrected it, so 
it looked fine to them.

                                  Roger




More information about the fedora-list mailing list