[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Linux-cluster] Linux-cluster Digest, Vol 92, Issue 16




On Dec 21, 2011, at 12:34 PM, SATHYA - IT wrote:

Hi Adam,

Thanks for your response. We are not currently having any redhat support for
HA and RS. We have the support only for the Server OS. 2 Nodes are running
with RHEL 6.2 in the cluster environment. The withdrawn message from the log
file are as follows:

Dec 21 10:32:43 filesrv2 avahi-daemon[9585]: Registering new address record
for 192.168.129.15 on bond0.IPv4.
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1: fatal:
filesystem consistency error
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1:   RG =
160469200
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1:   function =
gfs2_setbit, file = fs/gfs2/rgrp.c, line = 95
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1: about to
withdraw this file system
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1: telling LM to
unmount
Dec 21 10:33:10 filesrv2 kernel: GFS2: fsid=samba:hadata01.1: withdrawn
Dec 21 10:33:10 filesrv2 kernel: Pid: 26976, comm: smbd Not tainted
2.6.32-220.el6.x86_64 #1
Dec 21 10:33:10 filesrv2 kernel: Call Trace:
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa05508e2>] ?
gfs2_lm_withdraw+0x102/0x130 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff81090bdf>] ?
wake_up_bit+0x2f/0x40
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0550a8a>] ?
gfs2_consist_rgrpd_i+0x4a/0x50 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa054b5d0>] ?
rgblk_free+0x1f0/0x200 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa054b992>] ?
gfs2_free_data+0x42/0x130 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0524f80>] ? do_strip+0x450/0x470
[gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa05251bf>] ?
recursive_scan.clone.0+0xbf/0x280 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff81111aa7>] ?
find_lock_page+0x37/0x80
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff8115efb5>] ?
kmem_cache_alloc_notrace+0x115/0x130
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa052548d>] ?
trunc_dealloc+0x10d/0x130 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0537be1>] ?
gfs2_log_commit+0x1c1/0x300 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0526df3>] ?
gfs2_truncatei+0x4b3/0x820 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0543569>] ?
gfs2_setattr+0x119/0x3d0 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffffa0543496>] ?
gfs2_setattr+0x46/0x3d0 [gfs2]
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff81192698>] ?
notify_change+0x168/0x340
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff81174de4>] ?
do_truncate+0x64/0xa0
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff811750b0>] ?
sys_ftruncate+0xf0/0x100
Dec 21 10:33:10 filesrv2 kernel: [<ffffffff8100b308>] ? tracesys+0xd9/0xde
Dec 21 10:33:16 filesrv2 avahi-daemon[9585]: Withdrawing address record for
192.168.129.15 on bond0.
Dec 21 10:36:20 filesrv2 kernel: INFO: task gfs2_logd:9769 blocked for more
than 120 seconds.
Dec 21 10:36:20 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 10:36:20 filesrv2 kernel: gfs2_logd     D ffff8808a7824100     0
9769      2 0x00000000
Dec 21 10:36:20 filesrv2 kernel: ffff88087fe51dd0 0000000000000046
0000000000000000 000000004db7b07d
Dec 21 10:36:20 filesrv2 kernel: ffff88084d820cf8 0000000000000441
ffff88087fe51d70 ffffffff811a81be
Dec 21 10:36:20 filesrv2 kernel: ffff880888437af8 ffff88087fe51fd8
000000000000f4e8 ffff880888437af8
Dec 21 10:36:20 filesrv2 kernel: Call Trace:
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff811a81be>] ?
submit_bh+0x10e/0x150
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
Dec 21 10:36:20 filesrv2 kernel: [<ffffffffa05389ca>]
gfs2_log_flush+0x46a/0x6e0 [gfs2]
Dec 21 10:36:20 filesrv2 kernel: [<ffffffffa053736f>] ?
gfs2_ail1_empty+0x2f/0x1b0 [gfs2]
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Dec 21 10:36:20 filesrv2 kernel: [<ffffffffa0538d17>] gfs2_logd+0xd7/0x140
[gfs2]
Dec 21 10:36:20 filesrv2 kernel: [<ffffffffa0538c40>] ? gfs2_logd+0x0/0x140
[gfs2]
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Dec 21 10:36:20 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Dec 21 10:38:20 filesrv2 kernel: INFO: task gfs2_logd:9769 blocked for more
than 120 seconds.
Dec 21 10:38:20 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 10:38:20 filesrv2 kernel: gfs2_logd     D ffff8808a7824100     0
9769      2 0x00000000
Dec 21 10:38:20 filesrv2 kernel: ffff88087fe51dd0 0000000000000046
0000000000000000 000000004db7b07d
Dec 21 10:38:20 filesrv2 kernel: ffff88084d820cf8 0000000000000441
ffff88087fe51d70 ffffffff811a81be
Dec 21 10:38:20 filesrv2 kernel: ffff880888437af8 ffff88087fe51fd8
000000000000f4e8 ffff880888437af8
Dec 21 10:38:20 filesrv2 kernel: Call Trace:
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff811a81be>] ?
submit_bh+0x10e/0x150
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
Dec 21 10:38:20 filesrv2 kernel: [<ffffffffa05389ca>]
gfs2_log_flush+0x46a/0x6e0 [gfs2]
Dec 21 10:38:20 filesrv2 kernel: [<ffffffffa053736f>] ?
gfs2_ail1_empty+0x2f/0x1b0 [gfs2]
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Dec 21 10:38:20 filesrv2 kernel: [<ffffffffa0538d17>] gfs2_logd+0xd7/0x140
[gfs2]
Dec 21 10:38:20 filesrv2 kernel: [<ffffffffa0538c40>] ? gfs2_logd+0x0/0x140
[gfs2]
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Dec 21 10:38:20 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Dec 21 10:40:20 filesrv2 kernel: INFO: task gfs2_logd:9769 blocked for more
than 120 seconds.
Dec 21 10:40:20 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 10:40:20 filesrv2 kernel: gfs2_logd     D ffff8808a7824100     0
9769      2 0x00000000
Dec 21 10:40:20 filesrv2 kernel: ffff88087fe51dd0 0000000000000046
0000000000000000 000000004db7b07d
Dec 21 10:40:20 filesrv2 kernel: ffff88084d820cf8 0000000000000441
ffff88087fe51d70 ffffffff811a81be
Dec 21 10:40:20 filesrv2 kernel: ffff880888437af8 ffff88087fe51fd8
000000000000f4e8 ffff880888437af8
Dec 21 10:40:20 filesrv2 kernel: Call Trace:
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff811a81be>] ?
submit_bh+0x10e/0x150
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
Dec 21 10:40:20 filesrv2 kernel: [<ffffffffa05389ca>]
gfs2_log_flush+0x46a/0x6e0 [gfs2]
Dec 21 10:40:20 filesrv2 kernel: [<ffffffffa053736f>] ?
gfs2_ail1_empty+0x2f/0x1b0 [gfs2]
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Dec 21 10:40:20 filesrv2 kernel: [<ffffffffa0538d17>] gfs2_logd+0xd7/0x140
[gfs2]
Dec 21 10:40:20 filesrv2 kernel: [<ffffffffa0538c40>] ? gfs2_logd+0x0/0x140
[gfs2]
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Dec 21 10:40:20 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Dec 21 10:42:20 filesrv2 kernel: INFO: task gfs2_logd:9769 blocked for more
than 120 seconds.
Dec 21 10:42:20 filesrv2 kernel: "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 21 10:42:20 filesrv2 kernel: gfs2_logd     D ffff8808a7824100     0
9769      2 0x00000000
Dec 21 10:42:20 filesrv2 kernel: ffff88087fe51dd0 0000000000000046
0000000000000000 000000004db7b07d
Dec 21 10:42:20 filesrv2 kernel: ffff88084d820cf8 0000000000000441
ffff88087fe51d70 ffffffff811a81be
Dec 21 10:42:20 filesrv2 kernel: ffff880888437af8 ffff88087fe51fd8
000000000000f4e8 ffff880888437af8
Dec 21 10:42:20 filesrv2 kernel: Call Trace:
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff811a81be>] ?
submit_bh+0x10e/0x150
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff814ed1e3>] io_schedule+0x73/0xc0
Dec 21 10:42:20 filesrv2 kernel: [<ffffffffa05389ca>]
gfs2_log_flush+0x46a/0x6e0 [gfs2]
Dec 21 10:42:20 filesrv2 kernel: [<ffffffffa053736f>] ?
gfs2_ail1_empty+0x2f/0x1b0 [gfs2]
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Dec 21 10:42:20 filesrv2 kernel: [<ffffffffa0538d17>] gfs2_logd+0xd7/0x140
[gfs2]
Dec 21 10:42:20 filesrv2 kernel: [<ffffffffa0538c40>] ? gfs2_logd+0x0/0x140
[gfs2]
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff81090886>] kthread+0x96/0xa0
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff8100c14a>] child_rip+0xa/0x20
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Dec 21 10:42:20 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1: fatal: invalid
metadata block
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1:   bh = 51194408
(magic number)
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1:   function =
gfs2_meta_indirect_buffer, file = fs/gfs2/meta_io.c, line = 401
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1: about to withdraw
this file system
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1: telling LM to
unmount
Dec 21 10:43:42 filesrv2 kernel: GFS2: fsid=samba:gen01.1: withdrawn
Dec 21 10:43:42 filesrv2 kernel: Pid: 9710, comm: glock_workqueue Not
tainted 2.6.32-220.el6.x86_64 #1
Dec 21 10:43:42 filesrv2 kernel: Call Trace:
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa05508e2>] ?
gfs2_lm_withdraw+0x102/0x130 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff81090c30>] ?
wake_bit_function+0x0/0x50
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0550a35>] ?
gfs2_meta_check_ii+0x45/0x50 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa053b4a5>] ?
gfs2_meta_indirect_buffer+0x185/0x190 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0535e49>] ?
gfs2_inode_refresh+0x29/0x340 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff810ea694>] ?
rb_reserve_next_event+0xb4/0x370
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0535488>] ?
inode_go_lock+0x88/0xf0 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0533c07>] ?
do_promote+0x1c7/0x340 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0533ef8>] ?
finish_xmote+0x178/0x410 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0534d03>] ?
glock_work_func+0x133/0x1b0 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffffa0534bd0>] ?
glock_work_func+0x0/0x1b0 [gfs2]
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff8108b2b0>] ?
worker_thread+0x170/0x2a0
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff81090bf0>] ?
autoremove_wake_function+0x0/0x40
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff8108b140>] ?
worker_thread+0x0/0x2a0
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff81090886>] ? kthread+0x96/0xa0
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff8100c14a>] ? child_rip+0xa/0x20
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff810907f0>] ? kthread+0x0/0xa0
Dec 21 10:43:42 filesrv2 kernel: [<ffffffff8100c140>] ? child_rip+0x0/0x20

Thanks

Sathya Narayanan V
Solution Architect
M +91 9940680173 |T +91 44 42199500  | Service Desk +91 44 42199521
SERVICE - In PRECISION IT is a PASSION
----------------------------------------------------------------------------
-----------------------------
Precision Infomatic (M) Pvt Ltd
22, 1st Floor, Habibullah Road, T. Nagar, Chennai - 600 017. India.
www.precisionit.co.in


-----Original Message-----
From: linux-cluster-bounces redhat com
[mailto:linux-cluster-bounces redhat com] On Behalf Of
linux-cluster-request redhat com
Sent: Wednesday, December 21, 2011 10:30 PM
To: linux-cluster redhat com
Subject: Linux-cluster Digest, Vol 92, Issue 16

Send Linux-cluster mailing list submissions to
linux-cluster redhat com

To subscribe or unsubscribe via the World Wide Web, visit
https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
linux-cluster-request redhat com

You can reach the person managing the list at
linux-cluster-owner redhat com

When replying, please edit your Subject line so it is more specific than
"Re: Contents of Linux-cluster digest..."


Today's Topics:

  1. GFS2 Consistency... (SATHYA - IT)
  2. Re: GFS2 Consistency... (Adam Drew)


----------------------------------------------------------------------

Message: 1
Date: Wed, 21 Dec 2011 16:11:40 +0530
From: "SATHYA - IT" <sathyanarayanan varadharajan precisionit co in>
To: <linux-cluster redhat com>
Subject: [Linux-cluster] GFS2 Consistency...
Message-ID: <00e701ccbfcd$20d1c100$62754300$ precisionit co in>
Content-Type: text/plain; charset="us-ascii"

Hi,



We are having an cluster environment running on GFS2 + CTDB + Samba. Due to
some unavoidable circumstances we were forced to hard reboot the server 2 to
3 times. After the 3rd time restart, everything worked fine without any
issues. But after 4 to 5 hours online, we got a trigger stating File System
consistency error in one of the GFS2 partition. Hard reboot of 2 to 3 times
a server, whether it affects the GFS2 file system. Is that the file system
is that much sensitive. Whereas we won't have any issues in ext3/ext4 file
system earlier in related scenarios. Can anyone revert on the GFS2
consistency and its recommendation to run in production environment.





Thanks



Sathya Narayanan V

Solution Architect    

M +91 9940680173 |T +91 44 42199500  | Service Desk +91 44 42199521 SERVICE
- In PRECISION IT is a PASSION
----------------------------------------------------------------------------
-----------------------------
Precision Infomatic (M) Pvt Ltd
22, 1st Floor, Habibullah Road, T. Nagar, Chennai - 600 017. India.
<http://www.precisionit.co.in/> www.precisionit.co.in




This communication may contain confidential information.
If you are not the intended recipient it may be unlawful for you to read,
copy, distribute, disclose or otherwise use the information contained within
this communication..
Errors and Omissions may occur in the contents of this Email arising out of
or in connection with data transmission, network malfunction or failure,
machine or software error, malfunction, or operator errors by the person who
is sending the email.
Precision Group accepts no responsibility for any such errors or omissions.
The information, views and comments within this communication are those of
the individual and not necessarily those of Precision Group.
All email that is sent from/to Precision Group is scanned for the presence
of computer viruses, security issues and inappropriate content. However, it
is the recipient's responsibility to check any attachments for viruses
before use.
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://www.redhat.com/archives/linux-cluster/attachments/20111221/09f80f92
/attachment.html>

------------------------------

Message: 2
Date: Wed, 21 Dec 2011 10:09:46 -0500
From: Adam Drew <adrew redhat com>
To: linux clustering <linux-cluster redhat com>
Subject: Re: [Linux-cluster] GFS2 Consistency...
Message-ID: <C052F150-E6BA-41C5-B4C8-EC719105B73B redhat com>
Content-Type: text/plain; charset="windows-1252"


Hi,

We are having an cluster environment running on GFS2 + CTDB + Samba. Due
to some unavoidable circumstances we were forced to hard reboot the server 2
to 3 times. After the 3rd time restart, everything worked fine without any
issues. But after 4 to 5 hours online, we got a trigger stating File System
consistency error in one of the GFS2 partition. Hard reboot of 2 to 3 times
a server, whether it affects the GFS2 file system. Is that the file system
is that much sensitive. Whereas we won?t have any issues in ext3/ext4 file
system earlier in related scenarios. Can anyone revert on the GFS2
consistency and its recommendation to run in production environment.


Thanks

Sathya Narayanan V
Solution Architect   
M +91 9940680173 |T +91 44 42199500  | Service Desk +91 44 42199521
SERVICE - In PRECISION IT is a PASSION
----------------------------------------------------------------------
-----------------------------------
Precision Infomatic (M) Pvt Ltd
22, 1st Floor, Habibullah Road, T. Nagar, Chennai - 600 017. India.
www.precisionit.co.in


This communication may contain confidential information. If you are not
the intended recipient it may be unlawful for you to read, copy, distribute,
disclose or otherwise use the information contained within this
communication.. Errors and Omissions may occur in the contents of this Email
arising out of or in connection with data transmission, network malfunction
or failure, machine or software error, malfunction, or operator errors by
the person who is sending the email. Precision Group accepts no
responsibility for any such errors or omissions. The information, views and
comments within this communication are those of the individual and not
necessarily those of Precision Group. All email that is sent from/to
Precision Group is scanned for the presence of computer viruses, security
issues and inappropriate content. However, it is the recipient's
responsibility to check any attachments for viruses before use.
--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

Hello Sathya,

If you are experiencing GFS2 withdraws you may be running into a bug ,
filesystem corruption, or both. If you have a Red Hat support contract I
suggest opening a support case with Red Hat as soon as possible. When you
open the support case you'll want to attach sosreports from all nodes (run
the sosreport command on every node in the cluster and attach the resultant
tarballs to the support case.) If you've hit a withdraw you are likely to
keep hitting them and data loss or corruption is a tangible possibility; Red
Hat support can help identify the source of the issue and provide relief.

If you don't have a Red Hat support contract then please reply to the thread
with the kernel versions you are running on all nodes and the full withdraw
message and call traces from the messages logs on the affected cluster.
You'll be able to identify the withdraw easily in the logs. We'll want the
withdraw messages which will include a pointer to the position in code where
the error occurred and the nature of the withdraw. We'll also need the stack
trace that follows the withdraw as it will allow us to understand the code
path involved.

Thanks,
Adam

--
Adam Drew
Software Maintenance Engineer
Support Engineering Group
Red Hat, Inc.
Desk: (919) 754-4126
Cell: (919) 389-5334





-------------- next part --------------
An HTML attachment was scrubbed...
URL:
<https://www.redhat.com/archives/linux-cluster/attachments/20111221/b7c316f2
/attachment.html>

------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 92, Issue 16
*********************************************

This communication may contain confidential information.
If you are not the intended recipient it may be unlawful for you to read, copy, distribute, disclose or otherwise use the information contained within this communication..
Errors and Omissions may occur in the contents of this Email arising out of or in connection with data transmission, network malfunction or failure, machine or software error, malfunction, or operator errors by the person who is sending the email.
Precision Group accepts no responsibility for any such errors or omissions. The information, views and comments within this communication are those of the individual and not necessarily those of Precision Group.
All email that is sent from/to Precision Group is scanned for the presence of computer viruses, security issues and inappropriate content. However, it is the recipient's responsibility to check any attachments for viruses before use.

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

We're withdrawing in gfs2_meta_indirect_buffer which is a function that loads metadata into a buffer for use.  Where we are specifically failing is in a call to gfs2_metatype_check which does of the work of ensuring that the metadata we're loading into a buffer is of the type we expect. There's a macro and another function we pass through but ultimately we end up in gfs2_metatype_check_i which compares the expected metadata type with the type found in the buffer loaded from disk. If they don't match we withdraw.

So what does this all mean? It means that either the data on disk is corrupt (a section of what we expect should be metadata is not, or is the wrong kind of metadata) or it is some kind of memory corruption where the data in memory is being corrupted such that when we examine the buffer it appears to be the wrong type. From what I have here to analyze I cannot say which it is.

Your first action should be to unmount the filesystem in question from all nodes, update gfs2-utils, and run a gfs2_fsck on the filesystem. After the filesystem check is completed you can mount the filesystem back up and return to production. If the issue goes away then it was some anomalous sort of on-disk corruption. If the issue comes back then it is quite likely to either be a bug in GFS2 or something very wrong with the environment or workload (such as mounted without locking, or something doing block-level writes to metadata areas on disk, or something of that nature.)

If you find that you encounter further difficulties with the filesystem post-fsck I would advise, if you can, purchasing support for the Resilient Storage add-on entitlement and engaging support so that my group and I can assist you further. If you are unable to do so then you can create a bug report at bugzilla.redhat.com; but note that there are no production SLAs on bugzilla.

Good luck. I hope this helps in some capacity.

Thanks,
Adam

--
Adam Drew
Software Maintenance Engineer
Support Engineering Group
Red Hat, Inc.
Desk: (919) 754-4126
Cell: (919) 389-5334






[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]