[libvirt-users] host crashes "unable to handle paging request"

Raphael Bauduin rblists at gmail.com
Fri Mar 28 14:45:20 UTC 2014


On Wed, Mar 26, 2014 at 8:45 AM, Raphael Bauduin <rblists at gmail.com> wrote:

> Hi,
>
> we have regular crashed of a kvm host with the error "unable to handle
> paging request".
> Can this be due to memory over-commitment even if some memory is still
> used by the kernel for caches and buffers?  (collectd graph shows no free
> memory, with 15G used, very little buffers, and 1G cache). There are 32GB
> of swap, of which only 150MB are used.
>
> I suspect might be the direction to search to find the cause, but would be
> happy to learn from people versed in the kernel behaviour to confirm or
> reject my hypothesis. Below is the full error.
>
> Thanks!
>
> Raph
>
>
>
> 745 Mar 23 14:27:37 sMaster01 kernel: [241450.355339] BUG: unable to
> handle kernel paging request at ffff8804c001fade
>  746 Mar 23 14:27:37 sMaster01 kernel: [241450.355384] IP:
> [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd
>  747 Mar 23 14:27:37 sMaster01 kernel: [241450.355433] PGD 1002063 PUD 0
>  748 Mar 23 14:27:37 sMaster01 kernel: [241450.355464] Oops: 0000 [#1] SMP
>  749 Mar 23 14:27:37 sMaster01 kernel: [241450.355496] last sysfs file:
> /sys/devices/system/cpu/cpu15/
> topology/thread_siblings
>  750 Mar 23 14:27:37 sMaster01 kernel: [241450.355551] CPU 4
>  751 Mar 23 14:27:37 sMaster01 kernel: [241450.355577] Modules linked in:
> ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4
> xt_conntrack nf_conntrack ipt_REJECT xt_tcpudp kvm_amd kvm ip6table_filter
> ip6_tables iptable_fi     lter ip_tables x_tables tun nfsd exportfs nfs
> lockd fscache nfs_acl auth_rpcgss sunrpc bridge stp bonding dm_round_robin
> dm_multipath scsi_dh loop snd_pcm snd_timer snd soundcore snd_page_alloc
> serio_raw evdev tpm_tis tpm tpm_bios p     smouse pcspkr amd64_edac_mod
> edac_core button edac_mce_amd shpchp i2c_piix4 container pci_hotplug
> i2c_core processor ext3 jbd mbcache dm_mirror dm_region_hash dm_log
> dm_snapshot dm_mod sd_mod crc_t10dif mptsas mptscsih mptbase lpfc
> ehci_hcd scsi_transport_fc tg3 scsi_tgt scsi_transport_sas ohci_hcd libphy
> scsi_mod usbcore nls_base thermal fan thermal_sys [last unloaded:
> scsi_wait_scan]
>  752 Mar 23 14:27:37 sMaster01 kernel: [241450.356084] Pid: 3557, comm:
> kjournald Not tainted 2.6.32.61vanilla #1 PRIMERGY BX630 S2
>  753 Mar 23 14:27:37 sMaster01 kernel: [241450.356141] RIP:
> 0010:[<ffffffff8117e9e9>]  [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd
>  754 Mar 23 14:27:37 sMaster01 kernel: [241450.356196] RSP:
> 0018:ffff8804229abba0  EFLAGS: 00010202
>  755 Mar 23 14:27:37 sMaster01 kernel: [241450.356228] RAX:
> ffff8804c001fad6 RBX: ffff8802e7235080 RCX: 00011200061e5110
>  756 Mar 23 14:27:37 sMaster01 kernel: [241450.356279] RDX:
> 0000000000000008 RSI: 0000000000000008 RDI: ffff8802e7235080
>  757 Mar 23 14:27:37 sMaster01 kernel: [241450.356331] RBP:
> ffff8802e7235080 R08: 0000000000000000 R09: ffff880425c54c00
>  758 Mar 23 14:27:37 sMaster01 kernel: [241450.356383] R10:
> 0000000000000003 R11: 00000000022e539e R12: ffff8802e7235080
>  759 Mar 23 14:27:37 sMaster01 kernel: [241450.356434] R13:
> ffff8802e7235080 R14: ffff880425c54c00 R15: ffff8802e6281850
>  760 Mar 23 14:27:37 sMaster01 kernel: [241450.356486] FS:
> 00007faa6a757820(0000) GS:ffff88000fc80000(0000) knlGS:0000000000000000
>  761 Mar 23 14:27:37 sMaster01 kernel: [241450.356540] CS:  0010 DS: 0018
> ES: 0018 CR0: 000000008005003b
>  762 Mar 23 14:27:37 sMaster01 kernel: [241450.356573] CR2:
> ffff8804c001fade CR3: 00000000cc11f000 CR4: 00000000000006e0
>  763 Mar 23 14:27:37 sMaster01 kernel: [241450.356628] DR0:
> 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  764 Mar 23 14:27:37 sMaster01 kernel: [241450.356681] DR3:
> 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  765 Mar 23 14:27:37 sMaster01 kernel: [241450.356733] Process kjournald
> (pid: 3557, threadinfo ffff8804229aa000, task ffff88041490a300)
>  766 Mar 23 14:27:37 sMaster01 kernel: [241450.356788] Stack:
>  767 Mar 23 14:27:37 sMaster01 kernel: [241450.356812]  ffff880415382c00
> 0000000100000285 ffff8804229abfd8 0000000000005186
>  768 Mar 23 14:27:37 sMaster01 kernel: [241450.356852] <0>
> 0000000000000000 000000000f1c2776 ffff8804128efa38 ffff8802e7235080
>  769 Mar 23 14:27:37 sMaster01 kernel: [241450.356913] <0>
> ffff8802e7235080 ffff8802e7235080 ffff8800cdacae40 ffffffff8117eb5a
>  770 Mar 23 14:27:37 sMaster01 kernel: [241450.356993] Call Trace:
>  771 Mar 23 14:27:37 sMaster01 kernel: [241450.357021]
> [<ffffffff8117eb5a>] ? generic_make_request+0xcd/0x2f9
>  772 Mar 23 14:27:37 sMaster01 kernel: [241450.357058]
> [<ffffffff810b6034>] ? mempool_alloc+0x55/0x106
>  773 Mar 23 14:27:37 sMaster01 kernel: [241450.357091]
> [<ffffffff8117ee5c>] ? submit_bio+0xd6/0xf2
>  774 Mar 23 14:27:37 sMaster01 kernel: [241450.357125]
> [<ffffffff8110d83f>] ? submit_bh+0xf5/0x115
>  775 Mar 23 14:27:37 sMaster01 kernel: [241450.357158]
> [<ffffffff8110edc0>] ? sync_dirty_buffer+0x51/0x93
>  776 Mar 23 14:27:37 sMaster01 kernel: [241450.357196]
> [<ffffffffa01727c7>] ? journal_commit_transaction+0xaa6/0xe4f [jbd]
>  777 Mar 23 14:27:37 sMaster01 kernel: [241450.357252]
> [<ffffffffa0175194>] ? kjournald+0xdf/0x226 [jbd]
>  778 Mar 23 14:27:37 sMaster01 kernel: [241450.357288]
> [<ffffffff810651de>] ? autoremove_wake_function+0x0/0x2e
>  779 Mar 23 14:27:37 sMaster01 kernel: [241450.357324]
> [<ffffffffa01750b5>] ? kjournald+0x0/0x226 [jbd]
>  780 Mar 23 14:27:37 sMaster01 kernel: [241450.357357]
> [<ffffffff81064f11>] ? kthread+0x79/0x81
>  781 Mar 23 14:27:37 sMaster01 kernel: [241450.357391]
> [<ffffffff81011baa>] ? child_rip+0xa/0x20
>  782 Mar 23 14:27:37 sMaster01 kernel: [241450.357425]
> [<ffffffff81016568>] ? read_tsc+0xa/0x20
>  783 Mar 23 14:27:37 sMaster01 kernel: [241450.357456]
> [<ffffffff81064e98>] ? kthread+0x0/0x81
>  784 Mar 23 14:27:37 sMaster01 kernel: [241450.357487]
> [<ffffffff81011ba0>] ? child_rip+0x0/0x20
>  785 Mar 23 14:27:37 sMaster01 kernel: [241450.357517] Code: 5c c3 41 55
> 49 89 fd 41 54 55 53 48 83 ec 38 65 48 8b 04 25 28 00 00 00 48 89 44 24 28
> 31 c0 85 f6 0f 84 86 00 00 00 48 8b 47 10 <48> 8b 40 08 48 8b 40 68 48 c1
> f8 09 74 74 89      f2 48 8b 0f 48 39
>  786 Mar 23 14:27:37 sMaster01 kernel: [241450.357738] RIP
> [<ffffffff8117e9e9>] bio_check_eod+0x29/0xcd
>  787 Mar 23 14:27:37 sMaster01 kernel: [241450.357772]  RSP
> <ffff8804229abba0>
>  788 Mar 23 14:27:37 sMaster01 kernel: [241450.357799] CR2:
> ffff8804c001fade
>  789 Mar 23 14:27:37 sMaster01 kernel: [241450.358183] ---[ end trace
> 608fcf1f5a482549 ]---
>
>
We had a guest crashing with the same error "unable to handle kernel paging
request", but in the function __destroy_inode this time.
Could faulty memory cause this problem on host and guest?

Raph
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/libvirt-users/attachments/20140328/e5925b05/attachment.htm>


More information about the libvirt-users mailing list