cvs update

Stacy J Brandenburg sbranden at redhat.com
Mon Nov 20 21:00:48 UTC 2006


wed is the best day this week for me.

Lets see if we can arrange for that.


Matthew Galgoci wrote:
> Some time on friday, cvs-int.fedora.phx.redhat.com sustained undetermined
> storage problems and resulting filesystem corruption. As best I can figure,
> we had a one drive in a raid6 array drop offline, and another disk in that
> array emit scsi errors. Now, you're probably thinking, this is raid6, it
> should have been able to sustain losing two disks and keep on going.
> 
> Well, you're right and you're wrong. If two disks had simply dropped out of
> the array, we'd be fine. That wasn't the case however. Somewhere in the
> equation is data corruption. raid is great up until your hardware corrupts
> the data. To support this claim, all you need to do is realize that we
> sustained numerous ext3 errors and had the journal abort, and the root fs
> went read-only.
> 
> I did my level best to revive the system on friday and saturday. I was able
> to get it pxe booted onto rescue media, which helped recovery immensely. I
> took numerous screen shots to chronical what I went through as I attempted
> to recover the raid6 arrays and the logical volumes.
> 
> http://people.redhat.com/~mgalgoci/cvs-int.jpg
> http://people.redhat.com/~mgalgoci/cvs-int2.jpg
> http://people.redhat.com/~mgalgoci/cvs-int3.jpg
> http://people.redhat.com/~mgalgoci/cvs-int4.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs5.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs6.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs8.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs9.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs10.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs11.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs12.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs13.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs14.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs15.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs18.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs17.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs16.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs19.jpg
> http://people.redhat.com/~mgalgoci/fedora-cvs20.jpg
> 
> After #20, I said the hell with it, time to move on.
> 
> We've installed one of the new Dell 2950 machines that Dell was kind enough
> to donate to the Fedora Project. Mike McGrath is in the process of updatifying
> and restorifying the data from backups.
> 
> I have a Dell tech coming on site again today to do some more work on the
> old new cvs-int server. I think we know what the issues are on it and we'll
> have it usable again in the next day or so.
> 
> In the mean time, I think we need to take a look at all the Dell fedora boxes
> and check the scsi drives in them. There are known issues with certain drive
> firmware that cause drives to go offline and report spurrious errors.
> 
> The relevant Dell update is here:
> 
> http://support.us.dell.com/support/downloads/download.aspx?c=us&cs=555&l=en&s=biz&releaseid=R123859&formatcnt=1&libid=0&fileid=164751
> 
> We'll need downtime and hands on site to do this update. I'm sure Stacy will
> be able to assist.
> 

-- 
========================================================
= Stacy J. Brandenburg                    Red Hat Inc. =
= Manager, Network Operations      sbranden at redhat.com =
= 919-754-4313                   http://www.redhat.com =
========================================================




More information about the Fedora-infrastructure-list mailing list