[dm-devel] Fibre Channel related crash

William Alberto Lovaton Tovar williama_lovaton at coomeva.com.co
Tue Jun 21 13:49:44 UTC 2005


Hi there,

I'm writing here because I'm not sure where to ask for help.  If this
doesn't belong here please point me to the right list or bug report.

I have an IBM xSeries 445 server (SMP 8x HT, 18GB RAM, SAN) with Fedora
Core 3 and Oracle 9.2.0.6 running an enterprise web application with
PHP.  The app is very transactional and both server are under heavy load
(7+ millions of requests per day in apache).

The database have been working great with excellent performance but I
have been experimenting some random crashes, yesterday being the last
one after 75 days of uptime.  The messages log doesn't say anything
about the problem but I think it is related to the QLogic FC interface
since the Oracle datafiles are stored in a SAN.

I reported a bug against the tg3 NIC driver because I was seeing some
backtraces in the log but that is not what is causing the server crash.
David S. Miller have been working on it:
[] https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=152929

When I rebooted the server, some errors showed up in the log about
Buffer I/O error on an scsi device.

I have to say that I couldn't install Fedora Core 3 with the SAN
partitions enabled since it will confuse the installer thinking that
those "exported" disks were the first ones and the local disk was the
last one.

Without the SAN:
Local disk --> sda

With the SAN:
SAN 1      --> sda
SAN 2      --> sdb
Local disk --> sdc

I had to do the installation without the SAN and after "presenting" the
SAN disks to the server I got the second layout.  It have been working
fine that way since then... weird!.

You can see more information about this in the bug report above.

Since I don't know exactly where to look I'm attaching both, the
messages log after reboot with the mentioned errors and the output of
'lspci -v' so that you can see the hardware in my setup.

I hope you can give me any hint about this or point to any place where I
can get more information.  Under what bugzilla module should I create a
bug report?

Thanx,


-William

PS. As you can see in messages.txt.  The server clock seems a bit messed
up.  It always boot with the incorrect time and after I fix it in the
operating system it always get screwed after a reboot.


-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: lspci-2005-06-20.txt
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20050621/99b192f9/attachment.txt>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: messages-2005-06-20.txt
URL: <http://listman.redhat.com/archives/dm-devel/attachments/20050621/99b192f9/attachment-0001.txt>


More information about the dm-devel mailing list