Could drbd randomly flip bits? Was: Database page corruption on disk occurring during mysqldump on a fresh database and Was: Spontaneous development of supremely large files on different ext3 filesystems

Mon Sep 17 19:58:55 UTC 2007

Hi Maurice,

>> If you're running into corruption both in ext3 metadata and in MySQL 
>> data, it is certainly not he fault of MySQL as you're likely aware.
> 
> I am hoping they are not related. The problems with MySQL surfaced 
> almost immediately after upgrading to 5.0.x.

It's possible that they are not related, but it could even be 5.0 
specific but still not a MySQL bug.  I.e. MySQL 5.0 could be doing 
something that steps on the bug and causes it to occur.  But, it's hard 
to say anything for sure.  Nonetheless, I generally don't bother 
worrying about the possibility of MySQL bugs until I'm sure that the OS 
and hardware are stable.

>> You can see that there are in fact many bits flipped in each.  I 
>> would suspect higher-level corruption than
> 
> I initially thought this as well, but the explanation on the ext3 
> mailing list is that it really is just a lone flipped bit in both 
> instances. The other differences are due to fsck padding out the 
> block when it guesses what the correct size is.

Interesting.  Can you forward that mail to me personally, or summarize 
for the list?  I'd be interested to read the explanation.

>> Do note that data on e.g. the PCI bus is not protected by any sort 
>> of checksum.  I've seen this cause corruption problems with PCI 
>> risers and RAID cards.  Are you using a PCI riser card?  Note that 
>> LSI does *not* certify their cards to be used on risers if you are 
>> custom building a machine.
> 
> Yes, there is a riser card. Wouldn't this imply that LSI is saying 
> you can't use a 1U or a 2U box?

Kind of.  Presumably you would be buying a vendor integrated solution 
where they have certified that the riser card and RAID card are 
compatible.  Presumably.  You'll also notice that most vendors are 
moving to controllers that aren't PCI{,-E,-X} slot based, and rather 
connect directly to a low-profile integrated slot.  This removes a few 
variables.  (And frees up some space.)

> It's kind of scary there is no end-to-end parity implemented 
> somewhere along the whole data path to prevent this. It sort of 
> defeats the point of RAID 6 and ECC.

I agree, it's pretty damn scary.  You can read about the story and the 
ensuing discussion here:

http://jcole.us/blog/archives/2006/09/04/on-1u-cases-pci-risers-and-lsi-megaraid/

> How did you determine this was the cause?

Isolating lots of variables.  The customer in question had a workload 
that could reproduce the problem reliably, although not in the same 
place or same time to be able to track things down, and not under debug 
mode (which likely slowed things down enough to not cause trouble).

I finally suggested that they isolate the riser card as a variable by 
plugging it directly into the slot.  Since it was a 1U machine, it 
required taking the metal frame off the card and leaving the case open 
(and hanging out into the datacenter aisle).  it could then be shown 
that with riser, corruption always occurred, and without the riser, 
corruption never occurred.

Obviously, running the machines with cases open and cards plugged in 
directly was not an option, so the only other possible option was 
chosen: move to all new hardware with integrated RAID.  (HP and their 
integrated SmartArray/cciss controller was chosen as a vendor in this case.)

>> Do you mean a Serially-Attached SCSI aka SAS controller, I assume?
> 
> No, it's SATA to SCSI.

Interesting.  I hadn't heard of such a thing until I just looked it up. 
  But in any case that adds yet another variable (and a fairly uncommon 
one) to the mix.

Regards,

Jeremy

-- 
high performance mysql consulting
www.provenscaling.com