RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!

Shashank bhides at gmail.com
Sat Jun 30 02:19:33 UTC 2012


Actually the cluster checks are done via private network, so eth0
network loss should not have crashed the server.

Do you see any logs in /var/crash? Is kdump/netdump setup? Can you
post logs for ocssd (should be under grid directory) for the 10-15
minutes before the crash?


Also post the /var/log/messages for 10-15 minutes prior to the crash.



On Thu, Jun 21, 2012 at 1:04 AM, Georgios Magklaras
<georgios at biotek.uio.no> wrote:
> On 06/18/2012 08:44 AM, raj sourabh wrote:
>>
>> Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP:
>> [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0:
>> transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link
>> status definitely down for interface eth0, disabling it Jun 10 19:22:34
>> prddbs02 kernel: bonding: bond0: making interface eth2 the new active one.
>> Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun
>
> Before the soft lockup, what exactly caused the the NETDEV WATCHDOG loose
> eth0?
> For the __smp_call_function_many lockup, there were many fixes between 5.5
> and 5.6 in relation to multipath and other third party drivers
> that caused similar lookups. (why are you on 5.5 and not at least 5.6, which
> kernel are you running on)?
>
> Best regards,
>
> --
> --
> George Magklaras PhD
> RHCE no: 805008309135525
>
> Senior Systems Engineer/IT Manager
> Biotechnology Center of Oslo and
> the Norwegian Center for Molecular Medicine
> EMBnet TMPC Chair
>
> http://folk.uio.no/georgios
>
>
>
>
>> 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
>> [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46
>> prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU)
>> oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler
>> rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq
>> freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs
>> power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
>> acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev
>> sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45
>> dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror
>> dm_log
>> dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih
>> mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
>> ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd
>> Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP:
>> 0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
>> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
>> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
>> Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10
>> 19:22:46
>> prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
>> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
>> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
>> RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10
>> 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI:
>> 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000
>> R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02
>> kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16
>> Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14:
>> ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel:
>> FS:
>> 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000
>> Jun
>
> ...
>
>> Thanks for any help in advance :)
>>
>> Regards,
>> Raj
>
>
>
> --
> redhat-list mailing list
> unsubscribe mailto:redhat-list-request at redhat.com?subject=unsubscribe
> https://www.redhat.com/mailman/listinfo/redhat-list




More information about the redhat-list mailing list