bad (2 times) gfs read performance under x86_64 compared to i386 WAS: Re: [Linux-cluster] AW: GNBD multipath with devicemapper? -- possible solution

Hansjörg Maurer hansjoerg.maurer at dlr.de
Sun Apr 24 13:46:01 UTC 2005


Hi

I have done some futher testing on this issue, and I noticed an 
interesting behavior.
We have an installation with RHEL4 on i386 hardware (old SAN) and one under
new x86_64 (dual CPU each hyptherthreading).

On i386 read performance is better than write performance (under ext3 
and gfs)
On x86_64 read performance is better the write performance under ext3
but two times bader under gfs (Filsystem created with lock_nolock)

Here a short summary:
i386 ext3
write:    1:16
read:    0:55
read:    0:43 (blockdev --setra 8192)

i386 ext3
write:    1:30
read:    1:08
read:    1:08 (blockdev --setra 8192)


x86_64 ext3
write:    0:35
read:    0:27
read:    0:18 (blockdev --setra 8192)

x86_64 gfs
write:    0:27
read:    1:03
read:    1:03 (blockdev --setra 8192)


The gnbd tests I did last week were under x86_64 to, so that might not 
be an gnbd issue, but an gfs issue under
x86_64.
The tests where done under
 2.6.9-5.0.3.ELsmp and  2.6.9-6.38.ELsmp with current GFS from  CVS 
(RHEL4 TAG) with no difference.

Can anyone reproduce this behavior?

Greetings

Hansjörg


Here the detailed tests


[root at chianti sda1]# uname -a
Linux chianti.itsd.de 2.6.9-6.38.EL #1 Wed Apr 13 01:36:09 EDT 2005 i686 
athloni386 GNU/Linux


[root at chianti ~]# mkfs.ext3 /dev/sda1
mke2fs 1.35 (28-Feb-2004)
max_blocks 4294967295, rsv_groups = 0, rsv_gdb = 1024
Dateisystem-Label=
OS-Typ: Linux
BlockgröÃe=4096 (log=2)
FragmentgröÃe=4096 (log=2)
5341184 Inodes, 10673176 Blöcke
533658 Blöcke (5.00%) reserviert für den Superuser
erster Datenblock=0
Maximum filesystem blocks=12582912
326 Blockgruppen
32768 Blöcke pro Gruppe, 32768 Fragmente pro Gruppe
16384 Inodes pro Gruppe
Superblock-Sicherungskopien gespeichert in den Blöcken:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
        4096000, 7962624

Schreibe Inode-Tabellen: erledigt
inode.i_blocks = 98312, i_size = 4243456
Erstelle Journal (8192 Blöcke): erledigt
Schreibe Superblöcke und Dateisystem-Accountinginformationen: erledigt

Das Dateisystem wird automatisch alle 28 Mounts bzw. alle 180 Tage 
überprüft,
je nachdem, was zuerst eintritt. Veränderbar mit tune2fs -c oder -t .



[root at chianti ~]# mount /dev/sda1 /mnt/sda1/
[root at chianti ~]# cd /mnt/sda1
[root at chianti sda1]# free
             total       used       free     shared    buffers     cached
Mem:        515736     164504     351232          0      43544      25104
-/+ buffers/cache:      95856     419880
Swap:       786424        152     786272


[root at chianti sda1]# time mkfile 1000M a

real    1m16.242s
user    0m0.009s
sys     0m16.273s
[root at chianti sda1]# time mkfile 1000M b

real    1m17.738s
user    0m0.009s
sys     0m15.511s
[root at chianti sda1]# time cat a > /dev/null

real    0m55.025s
user    0m0.193s
sys     0m6.529s

[root at chianti sda1]# time cat b > /dev/null

real    0m54.241s
user    0m0.220s
sys     0m6.331s


[root at chianti sda1]# blockdev --setra 8192 /dev/sda1
[root at chianti sda1]# time cat a > /dev/null

real    0m43.565s
user    0m0.189s
sys     0m6.014s

[root at chianti sda1]# time cat b > /dev/null

real    0m43.214s
user    0m0.196s
sys     0m6.727s

[root at chianti ~]# gfs_mkfs -p lock_nolock -j 3 /dev/sda1
This will destroy any data on /dev/sda1.
  It appears to contain a EXT2/3 filesystem.

Are you sure you want to proceed? [y/n] y
Device:                    /dev/sda1
Blocksize:                 4096
Filesystem Size:           10573560
Journals:                  3
Resource Groups:           162
Locking Protocol:          lock_nolock
Lock Table:

Syncing...
All Done

[root at chianti sda1]# time mkfile 1000M a

real    1m27.421s
user    0m0.010s
sys     0m12.202s
[root at chianti sda1]# time mkfile 1000M b

real    1m35.009s
user    0m0.006s
sys     0m12.513s
[root at chianti sda1]# time cat a > /dev/null

real    1m12.609s
user    0m0.153s
sys     0m9.980s
[root at chianti sda1]# time cat b > /dev/null

real    1m7.989s
user    0m0.154s
sys     0m10.427s


[root at chianti sda1]# blockdev --setra 256 /dev/sda1
[root at chianti sda1]# time cat a > /dev/null

real    1m8.402s
user    0m0.082s
sys     0m8.841s
[root at chianti sda1]# time cat b > /dev/null

real    1m8.647s
user    0m0.124s
sys     0m9.565s
[root at chianti sda1]# blockdev --setra 8192 /dev/sda1
[root at chianti sda1]# time cat a > /dev/null

real    1m8.419s
user    0m0.115s
sys     0m9.262s



[root at rmvbs02 ~]# uname -a
Linux rmvbs02.cluster.robotic.dlr.de 2.6.9-5.0.3.ELsmp #1 SMP Sat Feb 19 
15:45:14 CST 2005 x86_64 x86_64 x86_64 GNU/Linux


[root at rmvbs02 ~]# mkfs.ext3 /dev/sdb1
mke2fs 1.35 (28-Feb-2004)
max_blocks 4294967295, rsv_groups = 131072, rsv_gdb = 977
Dateisystem-Label=
OS-Typ: Linux
BlockgröÃe=4096 (log=2)
FragmentgröÃe=4096 (log=2)
97615872 Inodes, 195221872 Blöcke
9761093 Blöcke (5.00%) reserviert für den Superuser
erster Datenblock=0
Maximum filesystem blocks=4294967296
5958 Blockgruppen
32768 Blöcke pro Gruppe, 32768 Fragmente pro Gruppe
16384 Inodes pro Gruppe
Superblock-Sicherungskopien gespeichert in den Blöcken:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 
2654208,
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
        102400000

Schreibe Inode-Tabellen: erledigt
inode.i_blocks = 140696, i_size = 4243456
Erstelle Journal (8192 Blöcke): erledigt
Schreibe Superblöcke und Dateisystem-Accountinginformationen: erledigt

Das Dateisystem wird automatisch alle 38 Mounts bzw. alle 180 Tage 
überprüft,
je nachdem, was zuerst eintritt. Veränderbar mit tune2fs -c oder -t .


[root at rmvbs02 tmp]# free
             total       used       free     shared    buffers     cached
Mem:       1026364      65960     960404          0        684      15544
-/+ buffers/cache:      49732     976632
Swap:      2739040       1360    2737680
[root at rmvbs02 tmp]# time mkfile 2000M a

real    0m35.560s
user    0m0.005s
sys     0m7.596s
[root at rmvbs02 tmp]# time mkfile 2000M b

real    0m30.917s
user    0m0.005s
sys     0m7.707s
[root at rmvbs02 tmp]# time mkfile 2000M c

real    0m40.663s
user    0m0.010s
sys     0m7.806s
[root at rmvbs02 tmp]# time cat a > /dev/null

real    0m28.787s
user    0m0.122s
sys     0m2.818s
[root at rmvbs02 tmp]# time cat b > /dev/null

real    0m27.468s
user    0m0.124s
sys     0m2.678s
[root at rmvbs02 tmp]# blockdev --getra /dev/sdb1
128
[root at rmvbs02 tmp]# blockdev --setra 8192 /dev/sdb1
[root at rmvbs02 tmp]# time cat c > /dev/null

real    0m18.541s
user    0m0.105s
sys     0m2.064s
[root at rmvbs02 tmp]# time cat a > /dev/null

real    0m18.464s
user    0m0.117s
sys     0m2.035s


[root at rmvbs02 ~]# gfs_mkfs -p lock_nolock -j 3 /dev/sdb1
This will destroy any data on /dev/sdb1.
  It appears to contain a EXT2/3 filesystem.

Are you sure you want to proceed? [y/n] y

Device:                    /dev/sdb1
Blocksize:                 4096
Filesystem Size:           195108656
Journals:                  3
Resource Groups:           2978
Locking Protocol:          lock_nolock
Lock Table:

Syncing...
All Done
[root at rmvbs02 ~]# mount -t gfs /dev/sdb1 /mnt/tmp/
[root at rmvbs02 ~]# cd /mnt/tmp/
[root at rmvbs02 tmp]# time mkfile 2000M a

real    0m24.616s
user    0m0.004s
sys     0m4.892s
[root at rmvbs02 tmp]# time mkfile 2000M b

real    0m27.023s
user    0m0.005s
sys     0m5.027s
[root at rmvbs02 tmp]# time mkfile 2000M c

real    0m29.205s
user    0m0.005s
sys     0m5.163s


[root at rmvbs02 tmp]# time cat a > /dev/null

real    1m4.698s
user    0m0.120s
sys     0m6.138s
[root at rmvbs02 tmp]# time cat b > /dev/null

real    1m2.958s
user    0m0.132s
sys     0m6.175s
[root at rmvbs02 tmp]# time cat c > /dev/null

real    1m2.867s
user    0m0.109s
sys     0m6.079s
[root at rmvbs02 tmp]# blockdev --getra  /dev/sdb1
8192
[root at rmvbs02 tmp]# blockdev --setra 256  /dev/sdb1
[root at rmvbs02 tmp]# time cat a > /dev/null

real    1m2.931s
user    0m0.101s
sys     0m6.073s





Benjamin Marzinski wrote:

>On Thu, Apr 21, 2005 at 07:58:49AM +0200, Hansjörg Maurer wrote:
>  
>
>>Hi
>>
>>thank you very much,
>>Do you have any idea what causes the speed difference between vanilla 
>>and RHEL4 kernel?
>>Should I officially file a bug in bugzilla, or do you take care of the 
>>problem?
>>    
>>
>
>Go ahead and file a bug on gnbd if you are interested in tracking the status
>of this. I'll put it on my list of things to do.
>
>-Ben
> 
>  
>
>>Greetings
>>
>>
>>Hansjörg
>>
>>
>>
>>
>>
>>    
>>
>>>I got those numbers on a vanilla 2.6.11 kernel.
>>>Running on a 2.6.9-6.24.EL kernel, I get
>>>45.6608 MB/sec writes 
>>>35.8578 MB/sec reads
>>>
>>>Not as pronouced as your numbers, but a noticeable speed difference.
>>>
>>>-Ben
>>>
>>>
>>>      
>>>
>>>>Thank you very much
>>>>
>>>>Hansjörg
>>>>
>>>>  
>>>>
>>>>        
>>>>
>>>>>    
>>>>>
>>>>>          
>>>>>
>>>>-- 
>>>>_________________________________________________________________
>>>>
>>>>Dr.  Hansjoerg Maurer           | LAN- & System-Manager
>>>>                             |
>>>>Deutsches Zentrum               | DLR Oberpfaffenhofen
>>>>f. Luft- und Raumfahrt e.V.   |
>>>>Institut f. Robotik             |
>>>>Postfach 1116                   | Muenchner Strasse 20
>>>>82230 Wessling                  | 82234 Wessling
>>>>Germany                         |
>>>>                             |
>>>>Tel: 08153/28-2431              | E-mail: Hansjoerg.Maurer at dlr.de
>>>>Fax: 08153/28-1134              | WWW: http://www.robotic.dlr.de/
>>>>__________________________________________________________________
>>>>
>>>>
>>>>There are 10 types of people in this world, 
>>>>those who understand binary and those who don't.
>>>>
>>>>--
>>>>Linux-cluster mailing list
>>>>Linux-cluster at redhat.com
>>>>http://www.redhat.com/mailman/listinfo/linux-cluster
>>>>  
>>>>
>>>>        
>>>>
>>>--
>>>Linux-cluster mailing list
>>>Linux-cluster at redhat.com
>>>http://www.redhat.com/mailman/listinfo/linux-cluster
>>>
>>>
>>>
>>>      
>>>
>>-- 
>>_________________________________________________________________
>>
>>Dr.  Hansjoerg Maurer           | LAN- & System-Manager
>>                               |
>>Deutsches Zentrum               | DLR Oberpfaffenhofen
>> f. Luft- und Raumfahrt e.V.   |
>>Institut f. Robotik             |
>>Postfach 1116                   | Muenchner Strasse 20
>>82230 Wessling                  | 82234 Wessling
>>Germany                         |
>>                               |
>>Tel: 08153/28-2431              | E-mail: Hansjoerg.Maurer at dlr.de
>>Fax: 08153/28-1134              | WWW: http://www.robotic.dlr.de/
>>__________________________________________________________________
>>
>>
>>There are 10 types of people in this world, 
>>those who understand binary and those who don't.
>>
>>--
>>Linux-cluster mailing list
>>Linux-cluster at redhat.com
>>http://www.redhat.com/mailman/listinfo/linux-cluster
>>    
>>
>
>--
>Linux-cluster mailing list
>Linux-cluster at redhat.com
>http://www.redhat.com/mailman/listinfo/linux-cluster
>
>  
>




More information about the Linux-cluster mailing list