RHEL 5.5 Oracle RAC cluster resbooted due to processor hung!!

raj sourabh rajsourabh1 at gmail.com
Mon Jun 18 06:44:15 UTC 2012


Hi,

I have raised this question with redhat support as well. Just want to
collect your thoughts on the below issue.
----
*Platform: RHEL 5.5 *
*Arch: 64 bit, Running Oracle RAC 11gr2 (2 Node cluster)*
*Problem Description: Node 2 of the cluster got rebooted. The reboot
process was initiated by Oracle due to unknown reasons. /var/log/messages
show that the processor was hung for 10 seconds (Please see the logs
below). What could be the cause of this??*


Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP:
[127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0:
transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link
status definitely down for interface eth0, disabling it Jun 10 19:22:34
prddbs02 kernel: bonding: bond0: making interface eth2 the new active one.
Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun
10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s!
[multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46
prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU)
oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler
rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq
freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs
power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev
sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45
dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log
dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih
mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd
Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP:
0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
__smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46
prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>]
__smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP:
0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel:
RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10
19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI:
00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000
R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02
kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16
Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14:
ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: FS:
000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 Jun
10 19:22:46 prddbs02 kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033 Jun 10 19:22:46 prddbs02 kernel: CR2: 00002aaaac004000
CR3: 0000000928447000 CR4: 00000000000006e0 Jun 10 19:22:46 prddbs02
kernel: Jun 10 19:22:46 prddbs02 kernel: Call Trace: Jun 10 19:22:46
prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10
19:22:46 prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a
Jun 10 19:22:46 prddbs02 kernel: [<ffffffff80077778>]
smp_call_function_many+0x38/0x4c Jun 10 19:22:46 prddbs02 kernel:
[<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10 19:22:46 prddbs02
kernel: [<ffffffff80077869>] smp_call_function+0x4e/0x5e Jun 10 19:22:46
prddbs02 kernel: [<ffffffff8007754d>] do_flush_tlb_all+0x0/0x6a Jun 10
19:22:46 prddbs02 kernel: [<ffffffff881fcb28>] :dm_mod:dev_status+0x0/0x38
Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800958c1>] on_each_cpu+0x10/0x22
Jun 10 19:22:46 prddbs02 kernel: [<ffffffff800d2017>]
__remove_vm_area+0x2b/0x42 Jun 10 19:22:46 prddbs02 kernel:
[<ffffffff800d2046>] remove_vm_area+0x18/0x25 Jun 10 19:22:46 prddbs02
kernel: [<ffffffff800d209a>] __vunmap+0x47/0xed Jun 10 19:22:46 prddbs02
kernel: [<ffffffff881fdeff>] :dm_mod:ctl_ioctl+0x237/0x25b Jun 10 19:22:46
prddbs02 kernel: [<ffffffff800424bd>] do_ioctl+0x55/0x6b Jun 10 19:22:46
prddbs02 kernel: [<ffffffff800304d6>] vfs_ioctl+0x457/0x4b9 Jun 10 19:22:46
prddbs02 kernel: [<ffffffff8000d3e9>] dput+0x2c/0x114 Jun 10 19:22:46
prddbs02 kernel: [<ffffffff8004cbb7>] sys_ioctl+0x59/0x78 Jun 10 19:22:46
prddbs02 kernel: [<ffffffff8005e116>] system_call+0x7e/0x83 Jun 10 19:22:46
prddbs02 kernel: Jun 10 19:23:04 prddbs02 kernel: BUG: soft lockup - CPU#4
stuck for 10s! [eecd:8758] Jun 10 19:23:04 prddbs02 kernel: CPU 4: Jun 10
19:23:04 prddbs02 kernel: Modules linked in: oracleacfs(PFU)
oracleadvm(PFU) oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si
ipmi_msghandler rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand
acpi_cpufreq freq_table bonding dm_round_robin dm_multipath scsi_dh video
backlight sbs power_meter hwmon i2c_ec dell_wmi wmi button battery
asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp
parport joydev sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca
sg dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero
dm_mirror dm_log dm_mod lpfc(U) scsi_transport_fc ata_piix li: Jun 10
19:23:04 prddbs02 kernel: Pid: 8758, comm: eecd Tainted: PF M
2.6.18-194.el5 #1 Jun 10 19:23:04 prddbs02 kernel: RIP:
0010:[<ffffffff80065bfc>] [<ffffffff80065bfc>] .text.lock.spinlock+0x2/0x30
Jun 10 19:23:04 prddbs02 kernel: RSP: 0018:ffff8108997d1bc0 EFLAGS:
00000286 Jun 10 19:23:04 prddbs02 kernel: RAX: 0000000000000000 RBX:
00000000d2a03d30 RCX: 0000000000000001 Jun 10 19:23:04 prddbs02 kernel:
RDX: ffff8108997d1d98 RSI: ffffffff885dd304 RDI: ffffffff8030e6c8 Jun 10
19:23:04 prddbs02 kernel: RBP: ffff8102f1aa8c10 R08: 0000000000000001 R09:
ffff8108997d1bf8 Jun 10 19:23:04 prddbs02 kernel: R10: ffff81089d5285c0
R11: 0000000000000000 R12: 0000000000000000 Jun 10 19:23:04 prddbs02
kernel: R13: 0000000000000000 R14: 0000000000000000 R15: 00000000000000fb
Jun 10 19:23:04 prddbs02 kernel: FS: 0000000000000000(0000)
GS:ffff81012077dd40(0063) knlGS:00000000d2a04b90 Jun 10 19:23:04 prddbs02
kernel: CS: 0010 DS: 002b ES: 002b CR0: 000000008005003b Jun 10 19:23:04
prddbs02 kernel: CR2: 00000000d2a02ddc CR3: 00000008f0781000 CR4:
00000000000006e0 Jun 10 19:23:04 prddbs02 kernel: Jun 10 19:23:04 prddbs02
kernel: Call Trace: Jun 10 19:23:04 prddbs02 kernel: [<ffffffff80077764>]
smp_call_function_many+0x24/0x4c Jun 10 19:23:04 prddbs02 kernel:
[<ffffffff885dd304>] :smbus:smbus_GetCpuError_callback+0x0/0x14 Jun 10
19:23:04 prddbs02 kernel: [<ffffffff80077869>] smp_call_function+0x4e/0x5e
Jun 10 19:23:04 prddbs02 kernel: [<ffffffff885e4fcd>]
:smbus:smbus_ioctl+0x2880/0x2f74 Jun 10 19:23:05 prddbs02 kernel:
[<ffffffff80063ff8>] thread_return+0x62/0xfe Jun 10 19:23:05 prddbs02
kernel: [<ffffffff880317ae>] :jbd:journal_stop+0x1f3/0x1ff Jun 10 19:23:05
prddbs02 kernel: [<ffffffff8002b379>] flush_tlb_page+0xac/0xda Jun 10
19:23:05 prddbs02 kernel: [<ffffffff80011149>] do_wp_page+0x3fd/0x902 Jun
10 19:23:05 prddbs02 kernel: [<ffffffff80009677>]
__handle_mm_fault+0xee5/0xfaa Jun 10 19:23:05 prddbs02 kernel:
[<ffffffff80022127>] __up_read+0x19/0x7f Jun 10 19:23:05 prddbs02 kernel:
[<ffffffff80067b88>] do_page_fault+0x4fe/0x874 Jun 10 19:23:05 prddbs02
kernel: [<ffffffff8006f1f5>] do_gettimeofday+0x40/0x90 Jun 10 19:23:05
prddbs02 kernel: [<ffffffff885e56d7>] :smbus:smbus_ioctl_compat+0x16/0x1d
Jun 10 19:23:05 prddbs02 kernel: [<ffffffff800fb8d4>]
compat_sys_ioctl+0xc5/0x2b2 Jun 10 19:23:05 prddbs02 kernel:
[<ffffffff8006249d>] sysenter_do_call+0x1e/0x76 Jun 10 19:23:05 prddbs02
kernel: Jun 10 19:23:14 prddbs02 kernel: BUG: soft lockup - CPU#4 stuck for
10s! [eecd:8758] Jun 10 19:23:14 prddbs02 kernel: CPU 4: Jun 10 19:23:14
prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU)
oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler
rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq
freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs
power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi
acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev
sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45
dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log
dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih
mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd
ehci_hcd Jun 10 19:23:14 prddbs02 kernel: Pid: 8758, comm: eecd Tainted: PF
M 2.6.18-194.el5 #1 Jun 10 19:23:14 prddbs02 kernel: RIP:
0010:[<ffffffff80065bfc>] [<ffffffff80065bfc>]


Thanks for any help in advance :)

Regards,
Raj



More information about the redhat-list mailing list