[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

F10+dmraid eats puppies! (and ate my system too)



I ran into this earlier in the week and after finally getting my machine back 
online am surprised to see that people aren't making a big stink about 
this... its got subtle nuances that make it nearly impossible to fix without 
loss of data.

I've found the following threads/bugs that appear related:

 https://bugzilla.redhat.com/show_bug.cgi?id=474697
 http://forums.fedoraforum.org/showthread.php?t=206206
 http://forums.fedoraforum.org/showthread.php?t=206284

Here's what happened to me...

I upgraded from F9 to F10 back on Nov 29th, and things seemed fine.  I 
upgraded the kernel last Wednesday, rebooted, and started seeing all sorts of 
crazy weirdness.  At first the system wouldn't boot at all, dying on errors 
of "killing init" and "corrupted libraries".  I thought it sounded like FS 
corruption, so I booted the rescue CD, ran fsck (which came back clean), and 
then proceeded to re-install some of the packages with the corrupted 
libraries, so I could at least get the machine up and running again.

After several cycles of "rescue CD, install packages, reboot, fail", I 
decided that even if I could get it running I wasn't going to trust it.  Went 
back to the rescue CD, and started backing up files onto other machines on 
the network here.

I then re-installed the machine, leaving my "/home" and "/usr/local" 
partitions as they were; reformatted everything else, but left those alone.  
Got the system up, but was then presented with the most shocking thing... it 
looked like my machine had basically done time-travel and was now *exactly* 
as it was on November 29th.  Files I know I'd edited were missing changes, e-
mails were lost, databases were missing data.

Took me a while to figure it out, but here's what happened...

When I upgraded from F9 to F10, Anaconda detected my nvidia dmraid mirror and 
installed F10 onto both halves of the mirror.  When I rebooted, though, it 
only picked up *ONE HALF* of the mirror... /dev/sda.  It had the UUIDs right, 
but it didn't mount /device/mapper/nvidia_xxxx but mounted sda instead.  When 
I did the kernel upgrade this week, *that* mounted sdb.  When I reinstalled, 
it *also* mounted sdb, not sda or dmraid.

When I looked at sda directly, I saw all of my recent changes to files that 
I'd made since the 29th.  When I looked at sdb directly, it was a snapshot of 
what my machine looked like on the 29th.

When we actually manage to get the bug fixed that caused this, anyone who's 
had this problem is potentially going to be in for a bigger world of hurt 
when applying the fix... I don't even think we can (with confidence) just 
nuke one half of the mirror and rebuild based on whats on the other half; how 
do we know which half they've been using?  In my case, I'd made ~2wks of 
changes to sda not knowing that I was only using half the mirror, and then 
after updating the kernel got bumped over to sdb and made changes there while 
trying to fix it.  Neither one was a mirror of the other, and each one had 
something on it that needed to be preserved.  YUCK.

Once I realized what'd happened to my machine I went into the BIOS and turned 
off the nvidia fakeraid and re-installed directly onto the two drives.  Isn't 
what I want as I'd at least like to have _some_ mirror of my data somewhere, 
but it was the only way I could get this machine running again.

Be forewarned.... F10+dmraid is *DANGEROUS* right now...

-- 
Graham TerMarsch


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]