[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

[Linux-cluster] Re: Linux-cluster Digest, Vol 64, Issue 12



Hi all

Just an update on the CentOS 5.3 and lucci.

When installing the network and enabling IPV6, the default , and well known, 127.0.0.1 localhost.localdomain localhost is replaced with a new
1:: localhost.localdomain localhost.

Luci lives on 127.0.0.1, so if that is not in /etc/hosts, luci will not start!

I think the biggest frustration with this was the problem that no proper logfile entry is written.

Tks
Andre


# Do not remove the following line, or various programs
# that require network functionality will fail.
::1	localhost.localdomain	localhost	apollo
127.0.0.1	localhost.localdomain	localhost



On Aug 11, 2009, at 6:00 PM, linux-cluster-request redhat com wrote:

Send Linux-cluster mailing list submissions to
	linux-cluster redhat com

To subscribe or unsubscribe via the World Wide Web, visit
	https://www.redhat.com/mailman/listinfo/linux-cluster
or, via email, send a message with subject or body 'help' to
	linux-cluster-request redhat com

You can reach the person managing the list at
	linux-cluster-owner redhat com

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Linux-cluster digest..."


Today's Topics:

  1. Succesfull installation on centos 5.3 with live kvm	migration
     (Robert Verspuy)
  2. Centos 5.3 X64 a& luci (akapp)
  3. Re: Linux-cluster Digest, Vol 64, Issue 10 (Bob Peterson)
  4. RHEL 4.7 fenced fails -- stuck join state: S-2,2,1 (Robert Hurst)
  5. Re: do I have a fence DRAC device? (bergman merctech com)


----------------------------------------------------------------------

Message: 1
Date: Tue, 11 Aug 2009 14:59:13 +0200
From: Robert Verspuy <robert exa-omicron nl>
Subject: [Linux-cluster] Succesfull installation on centos 5.3 with
	live kvm	migration
To: linux clustering <linux-cluster redhat com>
Message-ID: <4A816B21 2040907 exa-omicron nl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Getting cluster software including kvm virtual machines with live
migration working,
can be a very difficult task, with many obstacles.

But I would like to mention to the mailing list, that I just booked some
succes.
And because nobody is around to tell the wonderfull news,
I would like to share my hapiness here ;)

My setup:
2 NAS servers and 1 Supermicro bladeserver with 5 blades.

The NAS servers are running Openfiler 2.3
both NAS servers have:
1 Transcend IDE 4Gbyte flashcard (on the ide port on the mainboard).
3 x Transcend 4Gbyte usb sticks
8 SATA disks.

The IDE flashcard is setup in a raid-1 mirroring (md0) with one USB
stick providing the root FS voor openfiler
The other 2 USB sticks have 5 partitions: 4 x 500 MB and 1 x 2GB.
those are mirrored with raid-1 together. (md5 until md8 are the 500mb
partions, and md9 is the 2Gb partition).
Then the 8 harddisks are also tied together per 2 as mirroring raid1
(md1 until md4).

Then I used DRBD (8.2.7) to mirror the 4 raid-1's of the disks (md1
until md4) and the 2GB mirror (md9)
over the network to the other NAS server. (drbd1, drbd2, drbd3 and drbd4)
The 500mb raid-1's are used to store metadata of the 4 disk raid-1's.
The 2gb drbd (drbd0) has internal metadata.

The 2gb drbd (drbd0) is mounted as ext3 on one only server and is used
to store all kinds
of openfiler information that is needed on both nas servers,
like the openfiler config (mostly), dhcp leases database, openldap database.
And heartbeat makes sure that one NAS server is running all the
software, and with any problems,
it can switch over very easily.

The drbd1 til 4 are setup as a LVM PV, and bound together in one big VG.
From that VG, I created a 5 x 5GB LV to be used as root device for
blade1 til blade5
These LV's are stripped accross 2 PV for speed (altough that's still my
only bottleneck at the moment, but more later about this...).
These LV's are setup as iSCSI

I also created one big LV of around 600GB, which can be mounted through NFS.

Then a few more LV's are created (around 10GB, also iscsi) for every VM
I want.
For every iSCSI LV I create a separate target.

The Supermicro blades can boot from an iscsi device.
The exact scsi device is given through a DHCP option.
I only setup a initiatior name in the iscsi bios of the blade.

On the blade LV's I installed CentOS 5.3 (latest updates).
But with a few modifications.

I changes a few things in the initrd, to bound eth0 to br0 during the
linux boot,
and before linux is taking over the iscsi from the bios, because
when you have a linux root through iscsi, and try to attach eth0 to br0,
you loose networkconnectivity for a moment, and could crash the linux,
because everything it uses, comes from the network (iscsi root).
I also added a little script to the initrd to call iscsiadm with a fixed
iscsi
target, because unfortunately iscsiadm can't read the iscsi settings
from dhcp
or the supermicro firmware.

When the blades are booted, they all join one redhat cluster with 3
nodes to be quorum.
Because I have 5 blades, two can fail before everything stops working.

Then I compiled the following software my own, because the ones in the
centos repo,
and the testing repo didn't function correctly:
libvirt 0.7.0 (./configure --prefix=/usr)
kvm-88 (./configure --prefix=/usr --disable-xen)

The /usr/share/cluster/vm.sh from the default centos repo is still based
on xen.
I downloaded the latest from
https://bugzilla.redhat.com/show_bug.cgi?id=412911
but it appears that that one is not working correctly either.
I made some changes myself.

And now it's working all together very nicely

I just ran a VM on blade1, and while this VM was running bonnie++ on a
NFS mount to the NAS server,
I live-migrated it about 10 times to blade2 and back.

During this bonnie++ run and live migrations, I pinged the device.
And where the normal ping times are around 20-35 ms (I pinged through a
VPN line from my home to the data center).
I only saw one or 2 pings just around the end of the live migration that
were around 40-60ms.
but no drops, and no errors in bonnie++.

I will write some more information about the complete setup, and post it
somewhere on my blog or someting,
But I just wanted to let everybody know, that it can be done ;)

If you have any questions, let me know.

The only 'problem' I still have is the speed to and from the disks.
When I update any settings on the bladeserver. I always do this on blade1. Then shut it down, On the NAS server I copy the content of the iscsi LV
to an image file on the ext3 LV.
Then I can power up blade1, wait until it reenters the cluster,
and then one by one shut down the next blade, On the NAS copy the image
from the ext3 LV to the blade LV.
And start the blade again.

I use the drbd1 til drbd4 as 4 PV's for a VG.
The speed (hdparm -t on the NAS) of all PV's are around 75 MB/sec
(except for one which is 45MB/sec)

The blade LV (/dev/vg0/blade1 for example) is striped over 2 PV's.
The Speed (hdparm -t) of /dev/vg0/blade1 is 122MB/sec.

The ext3 LV (/dev/vg0/data0) is striped over 4 PV's.
The Speed (hdparm -t) of /dev/vg0/data0 is 227 MB/sec.

But when copying from the blade LV to the ext3 LV:
dd if=/dev/vg0/blade1 of=/mnt/vg0/data0/vm/image/blade_v2.7.img
it takes about 70 seconds, which is about 75MB/sec.

but when copying back:
dd if=/mnt/vg0/data0/vm/image/blade_v2.7.img of=/dev/vg0/blade1
It takes about 390 seconds, which is about 13MB/sec

I think it has something to do with the striped over 4 PV's of the LVM.
So I will try to create a new ext3 LV stiped accross 2 PV's and see if
this is faster.

Robert Verspuy

--
*Exa-Omicron*
Patroonsweg 10
3892 DB Zeewolde
Tel.: 088-OMICRON (66 427 66)
http://www.exa-omicron.nl



------------------------------

Message: 2
Date: Tue, 11 Aug 2009 14:58:38 +0200
From: akapp <akapp fnds3000 com>
Subject: [Linux-cluster] Centos 5.3 X64 a& luci
To: linux-cluster redhat com
Message-ID: <34CB6BB9-6856-4E52-B946-5465F90B081C fnds3000 com>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

Good day

I have a Sun x4100 server running Centos 5.3 X64 - patch to latest and
greatest.

When trying to start luci, it simply fails, no error in /var/log  and
nothing in /var/lib/luci/log

I have re-installed luci and ricci a couple of times now. Cleaned out /
var/lib/luci & /rici between installations.

I have even tried the complete yum grouremove "Clustering" "Cluster
Storage" and re-installed the complete package again.

Used ricci/luci combination with great success in 5.2, but both
servers giving the same problem.

Any pointers will be appreciated.


Here is screen snipped of problem:



Installed: luci.x86_64 0:0.12.1-7.3.el5.centos.1
Complete!
[root clu1 luci]# luci_admin init
Initializing the luci server


Creating the 'admin' user

Enter password:
Confirm password:

Please wait...
The admin password has been successfully set.
Generating SSL certificates...
The luci server has been successfully initialized


You must restart the luci server for changes to take effect.

Run "service luci restart" to do so

[root clu1 luci]# service luci restart
Shutting down luci:                                        [  OK  ]
Starting luci: Generating https SSL certificates...  done
                                                           [FAILED]
[root clu1 luci]#






Tks
Andre



------------------------------

Message: 3
Date: Tue, 11 Aug 2009 09:24:14 -0400 (EDT)
From: Bob Peterson <rpeterso redhat com>
Subject: [Linux-cluster] Re: Linux-cluster Digest, Vol 64, Issue 10
To: Wendell Dingus <wendell bisonline com>
Cc: linux-cluster redhat com
Message-ID:
<1891903567 443711249997054068 JavaMail root zmail06 collab prod int phx2 redhat com >
	
Content-Type: text/plain; charset=utf-8

----- "Wendell Dingus" <wendell bisonline com> wrote:
| Well, here's the entire list of blocks it ignored and the entire
| message section.
| Perhaps I'm just overlooking it but I'm not seeing anything in the
| messages
| that appears to be a block number. Maybe 1633350398 but if so it is
| not a match.

Your assumption is correct.  The block number was 1633350398, which
is labeled "bh = " for some reason.

| Anyway, since you didn't specifically say a new/fixed version of fsck
| was
| imminent and that it would likely fix this we began plan B today. We

Yesterday I pushed a newer gfs_fsck and fsck.gfs2 to their appropriate
git source repositories.  So you can build that version from source if
you need it right away.  But it sounds like it wouldn't have helped
your problem anyway.  What would really be nice is if there is a way
to recreate the problem in our lab.  In theory, this error could be
caused by a hardware problem too.

| plugged
| in another drive, placed a GFS2 filesystem on it and am actively
| copying files
| off to it now. Fingers crossed that nothing will hit a disk block that
| causes
| this again but I could be so lucky probably...

It's hard to say whether you'll hit it again.

Regards,

Bob Peterson
Red Hat File Systems



------------------------------

Message: 4
Date: Tue, 11 Aug 2009 10:55:48 -0400
From: "Robert Hurst" <rhurst bidmc harvard edu>
Subject: [Linux-cluster] RHEL 4.7 fenced fails -- stuck join state:
	S-2,2,1
To: "linux clustering" <linux-cluster redhat com>
Message-ID: <1250002549 2782 36 camel WSBID06223 bidmc harvard edu>
Content-Type: text/plain; charset="us-ascii"

Simple 4-node cluster, 2-nodes have a GFS shared home directory mounted
for over a month.  Today, I wanted to mount /home on a 3rd node, so:

# service fenced start                [failed]

Weird.  Checking /var/log/messages show:

Aug 11 10:19:06 cerberus kernel: Lock_Harness 2.6.9-80.9.el4_7.10 (built
Jan 22 2009 18:39:16) installed
Aug 11 10:19:06 cerberus kernel: GFS 2.6.9-80.9.el4_7.10 (built Jan 22
2009 18:39:32) installed
Aug 11 10:19:06 cerberus kernel: GFS: Trying to join cluster "lock_dlm",
"ccc_cluster47:home"
Aug 11 10:19:06 cerberus kernel: Lock_DLM (built Jan 22 2009 18:39:18)
installed
Aug 11 10:19:06 cerberus kernel: lock_dlm: fence domain not found; check
fenced
Aug 11 10:19:06 cerberus kernel: GFS: can't mount proto = lock_dlm,
table = ccc_cluster47:home, hostdata =

# cman_tool services
Service          Name                              GID LID State
Code
Fence Domain:    "default"                           0   2 join
S-2,2,1
[]

So, a fenced process is now hung:

root 28302 0.0 0.0 3668 192 ? Ss 10:19 0:00 fenced -t
120 -w

Q: Any idea how to "recover" from this state, without rebooting?

The other two servers are unaffected by this (thankfully) and show
normal operations:

$ cman_tool services

Service          Name                              GID LID State
Code
Fence Domain:    "default"                           2   2 run       -
[1 12]

DLM Lock Space:  "home"                              5   5 run       -
[1 12]

GFS Mount Group: "home"                              6   6 run       -
[1 12]

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://www.redhat.com/archives/linux-cluster/attachments/20090811/7b46b120/attachment.html

------------------------------

Message: 5
Date: Tue, 11 Aug 2009 11:39:39 -0400
From: bergman merctech com
Subject: Re: [Linux-cluster] do I have a fence DRAC device?
To: linux clustering <linux-cluster redhat com>
Message-ID: <27649 1250005179 mirchi>
Content-Type: text/plain; charset=us-ascii



In the message dated: Tue, 11 Aug 2009 14:14:03 +0200,
The pithy ruminations from Juan Ramon Martin Blanco on
<Re: [Linux-cluster] do I have a fence DRAC device?> were:
=> --===============1917368601==
=> Content-Type: multipart/alternative; boundary=0016364c7c07663f600470dca3b8
=>
=> --0016364c7c07663f600470dca3b8
=> Content-Type: text/plain; charset=ISO-8859-1
=> Content-Transfer-Encoding: quoted-printable
=>
=> On Tue, Aug 11, 2009 at 2:03 PM, ESGLinux <esggrupos gmail com> wrote:
=>
=> > Thanks
=> > I=B4ll check it when I could reboot the server.
=> >
=> > greetings,
=> >
=> You have a BMC ipmi in the first network interface, it can be configured at => boot time (I don't remember if inside the BIOS or pressing cntrl +something
=> during boot)
=>

Based on my notes, here's how I configured the DRAC interface on a Dell 1950
for use as a fence device:

Configuring the card from Linux depending on the installation of Dell's
	OMSA package. Once that's installed, use the following
commands:

		racadm config -g cfgSerial -o cfgSerialTelnetEnable 1
racadm config -g cfgLanNetworking -o cfgDNSRacName HOSTNAME_FOR_INTERFACE
		racadm config -g cfgDNSDomainName DOMAINNAME_FOR_INTERFACE
		racadm config -g cfgUserAdmin -o cfgUserAdminPassword -i 2 PASSWORD
		racadm config -g cfgNicEnable 1
		racadm config -g cfgNicIpAddress WWW.XXX.YYY.ZZZ
		racadm config -g cfgNicNetmask WWW.XXX.YYY.ZZZ
		racadm config -g cfgNicGateway WWW.XXX.YYY.ZZZ
		racadm config -g cfgNicUseDhcp 0


	I also save a backup of the configuration with:

		racadm getconfig -f ~/drac_config


Hope this helps,

Mark

----
Mark Bergman                              voice: 215-662-7310
mark bergman uphs upenn edu                 fax: 215-614-0266
System Administrator     Section of Biomedical Image Analysis
Department of Radiology            University of Pennsylvania
     PGP Key: https://www.rad.upenn.edu/sbia/bergman


=> Greetings,
=> Juanra
=>
=> >
=> > ESG
=> >
=> > 2009/8/10 Paras pradhan <pradhanparas gmail com>
=> >
=> > On Mon, Aug 10, 2009 at 5:24 AM, ESGLinux<esggrupos gmail com> wrote:
=> >> > Hi all,
=> >> > I was designing a 2 node cluster and I was going to use 2 servers DELL => >> > PowerEdge 1950. I was going to buy a DRAC card to use for fencing but => >> > running several commands in the servers I have noticed that when I run
=> >> this
=> >> > command:
=> >> > #ipmitool lan print
=> >> > Set in Progress : Set Complete
=> >> > Auth Type Support : NONE MD2 MD5 PASSWORD
=> >> > Auth Type Enable : Callback : MD2 MD5
=> >> >                         : User : MD2 MD5
=> >> >                         : Operator : MD2 MD5
=> >> >                         : Admin : MD2 MD5
=> >> >                         : OEM : MD2 MD5
=> >> > IP Address Source : Static Address
=> >> > IP Address : 0.0.0.0
=> >> > Subnet Mask : 0.0.0.0
=> >> > MAC Address : 00:1e:c9:ae:6f:7e
=> >> > SNMP Community String : public
=> >> > IP Header : TTL=0x40 Flags=0x40 Precedence=0x00 TOS=0x10
=> >> > Default Gateway IP : 0.0.0.0
=> >> > Default Gateway MAC : 00:00:00:00:00:00
=> >> > Backup Gateway IP : 0.0.0.0
=> >> > Backup Gateway MAC : 00:00:00:00:00:00
=> >> > 802.1q VLAN ID : Disabled
=> >> > 802.1q VLAN Priority : 0
=> >> > RMCP+ Cipher Suites : 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14
=> >> > Cipher Suite Priv Max : aaaaaaaaaaaaaaa
=> >> >                         : X=Cipher Suite Unused
=> >> >                         : c=CALLBACK
=> >> >                         : u=USER
=> >> >                         : o=OPERATOR
=> >> >                         : a=ADMIN
=> >> >                         : O=OEM
=> >> > does this mean that I already have an ipmi card (not configured) that
=> I
=> >> can
=> >> > use for fencing? if the anwser is yes, where hell must I configure it?
=>  I
=> >> > don=B4t see wher can I do it.
=> >> > If I haven=B4t a fencing device which one do you recommed to use?
=> >> > Thanks in advance
=> >> > ESG
=> >> >
=> >> > --
=> >> > Linux-cluster mailing list
=> >> > Linux-cluster redhat com
=> >> > https://www.redhat.com/mailman/listinfo/linux-cluster
=> >> >
=> >>
=> >> Yes you have IPMI and if you are using 1950 Dell, DRAC should be there => >> too. You can see if you have DRAC or not when the server starts and
=> >> before the loading of the OS.
=> >>
=> >> I have 1850s and I am using DRAC for fencing.
=> >>
=> >>
=> >> Paras.
=> >>
=> >> --
=> >> Linux-cluster mailing list
=> >> Linux-cluster redhat com
=> >> https://www.redhat.com/mailman/listinfo/linux-cluster
=> >>
=> >
=> >





------------------------------

--
Linux-cluster mailing list
Linux-cluster redhat com
https://www.redhat.com/mailman/listinfo/linux-cluster

End of Linux-cluster Digest, Vol 64, Issue 12
*********************************************



[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]