[rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array

Daryl Herzmann akrherz at iastate.edu
Tue Sep 25 19:27:54 UTC 2012


Well howdy,

Lo and behold, today's kernel errata has this in the changelog:

- [md] raid1, raid10: avoid deadlock during resync/recovery. (Dave
Wysochanski) [845464 835613]
- [md] raid5: Reintroduce locking in handle_stripe() to avoid racing
(Jes Sorensen) [846836 828065]

Looks like I can finally update! :)

daryl

On Mon, Sep 3, 2012 at 9:28 AM, Daryl Herzmann <akrherz at iastate.edu> wrote:
> On Thu, Aug 16, 2012 at 1:08 PM, David C. Miller
> <millerdc at fusion.gat.com> wrote:
>>
>>
>> ----- Original Message -----
>>> From: "Daryl Herzmann" <akrherz at iastate.edu>
>>> To: "Red Hat Enterprise Linux 6 (Santiago) discussion mailing-list" <rhelv6-list at redhat.com>
>>> Sent: Wednesday, August 15, 2012 7:32:15 AM
>>> Subject: Re: [rhelv6-list] Nasty bug with writing to resyncing RAID-5 Array
>>>
>>> On Sun, Jun 24, 2012 at 12:48 PM, Stephen John Smoogen
>>> <smooge at gmail.com> wrote:
>>> > On 23 June 2012 11:04, Daryl Herzmann <akrherz at iastate.edu> wrote:
>>> >> On Fri, Jun 22, 2012 at 4:03 PM, Stephen John Smoogen
>>> >> <smooge at gmail.com> wrote:
>>> >>> On 22 June 2012 14:10, daryl herzmann <akrherz at iastate.edu>
>>> >>> wrote:
>>> >>>> Howdy,
>>> >>>>
>>> >>>> The RHEL6.3 release notes have a curious entry:
>>> >>>>
>>> >>>> http://docs.redhat.com/docs/en-US/Red_Hat_Enterprise_Linux/6/html/6.3_Technical_Notes/kernel_issues.html
>>> >>>>
>>> >>>>  kernel component
>>> >>>>
>>> >>>>  Due to a race condition, in certain cases, writes to RAID4/5/6
>>> >>>>  while the
>>> >>>>  array is reconstructing could hang the system
>>> >>>>
>>> >>>> Wow, I am reproducing it frequently here.  Simply have a RAID-5
>>> >>>> software
>>> >>>> array and do some write IO to it, eventually things start
>>> >>>> hanging and the
>>> >>>> power button needs to be pressed.
>>> >>>>
>>> >>>> Oh man.
>>> >>>
>>> >>> Well the race condition they are mentioning should only happen
>>> >>> when
>>> >>> the RAID array is reconstructing. This sounds like a different
>>> >>> bug/problem. What kind of disks, type of RAID etc.
>>> >>
>>> >> Thanks for the response.  I am not sure of the difference between
>>> >> 'reconstructing' and 'resyncing' and/or 'syncing'.  The
>>> >> reproducing
>>> >> case was quite easy for me.
>>> >>
>>> >> 1. Create a software raid5
>>> >> 2. Immediately then create a filesystem on this raid5, while init
>>> >> sync underway
>>> >> 3. IO to the RAID device eventually stops, even for the software
>>> >> raid5 sync
>>> >
>>> > Ok reconstructing is where the initial RAID drives pair up with
>>> > each
>>> > other. Resyncing I believe is where a RAID which has been created
>>> > is
>>> > putting the data across its raid. Basic cat /proc/mdstat.. if there
>>> > is
>>> > a line ====> then you are reconstructing the disk array. In the
>>> > example you give above, the disks would be reconstructing
>>> >
>>> > So the next thing to do is why you are able to trigger it
>>> > constantly.
>>> > That may be due to
>>> > CPU Type:
>>> > RAM Amount:
>>> > Disk controllers:
>>> > DIsk types (SATA, SAS, SCSI, PATA):
>>> > RAID type:
>>> > RAID layout (same controller, different controller, etc):
>>>
>>> I don't seem to have much issue reproducing, I just had another
>>> machine do it this morning.  Nehalem processor, 12 GB ram, Dell
>>> PowerEdge T400, Perc 6i controller, software raid 5, Seagate 2 TB
>>> Barracuda drives...
>>>
>>> Does anybody have the bugzilla ticket associated with this or perhaps
>>> a knowledge base article on it?
>>>
>>> daryl
>>>
>>
>> I would like to know too. I have not seen this issue yet but I do have some large RAID6 arrays.
>
> The private bugzilla tracking this is:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=828065
>
> It appears the hope is to resolve this for the RHEL6.4 release.
>
> daryl




More information about the rhelv6-list mailing list