Does redhat linux log all hardware events/issues/error in /var/log/mcelog?

Paul Tader ptader at linuxscope.com
Tue Mar 13 01:00:06 UTC 2012


On 3/12/12 5:28 PM, unix syzadmin wrote:
> Hi,
>
> We run redhat linux on intel hardware (mostly Dell, lately dell R710s).
> We want to be able to catch any hardware issues when they occur to act on
> them as quickly as possible.
>
> My understanding is that all hardware events/issues/errors are logged in
> /var/log/mcelog (Machine Check Events log).  Is this correct?  Can't stress
> this enough; does it log all hardware issues
> (cpu,memory,disk,ethernet,fibre/hba etc) ?
>
> Thanks,

I've used MCElog to catch some CPU events but I think you might want to 
check out Dell's OpenManage client.  It will report/monitor a lot more 
information.


http://linux.dell.com/wiki/index.php/Repository/OMSA


To install:

# wget -q -O - http://linux.dell.com/repo/hardware/latest/bootstrap.cgi 
| bash
# yum install srvadmin-base
# yum install srvadmin-storageservices

(logout / login for environment variables to take effect)

# /opt/dell/srvadmin/sbin/srvadmin-services.sh  start
...

# omreport chassis
Health

Main System Chassis

SEVERITY : COMPONENT
Ok       : Fans
Ok       : Intrusion
Ok       : Memory
Ok       : Power Supplies
Ok       : Processors
Ok       : Temperatures
Ok       : Voltages
Ok       : Hardware Log
Ok       : Batteries

# omreport chassis temps
Temperature Probes Information

------------------------------------
Main System Chassis Temperatures: Ok
------------------------------------

Index                     : 0
Status                    : Ok
Probe Name                : System Board Ambient Temp
Reading                   : 20.0 C
Minimum Warning Threshold : 8.0 C
Maximum Warning Threshold : 42.0 C
Minimum Failure Threshold : 3.0 C
Maximum Failure Threshold : 47.0 C

# omreport storage pdisk controller=0

List of Physical Disks on Controller SAS 6/iR Integrated (Embedded)

Controller SAS 6/iR Integrated (Embedded)
ID                        : 0:0:0
Status                    : Ok
Name                      : Physical Disk 0:0:0
State                     : Online
Failure Predicted         : No
Certified                 : Not Applicable
Encryption Capable        : No
Secured                   : Not Applicable
Progress                  : Not Applicable
Bus Protocol              : SAS
Media                     : HDD
Capacity                  : 67.75 GB (72746008576 bytes)
Used RAID Disk Space      : 67.75 GB (72746008576 bytes)
Available RAID Disk Space : 0.00 GB (0 bytes)
Hot Spare                 : No
Vendor ID                 : DELL
Product ID                : ST973402SS
Revision                  : S229

<snip>

You get the idea.




More information about the redhat-list mailing list