[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

Re: [Pulp-list] pulp v1 vs pulp v2 rpm repo sync times

On 02/25/2013 02:35 PM, Mike McCune wrote:
On 02/25/2013 10:29 AM, Randy Barlow wrote:
On Mon, 25 Feb 2013, Jay Dobies wrote:
Touch base with Randy. He tweaked a bunch of those numbers before v2
and should be able to point you to the best places to start playing

I had done some lab testing with the --num-threads parameter in
December. I had learned that our Grinder code was using threads for CPU
intensive work during downloads. Due to the Python GIL, this was
actually causing the threads to thrash each other, which significantly
lowered performance for synchronization.

For my test, I used traffic control to limit my bandwidth to 20 Mbps, 10
Mpbs, and 1 Mbps between myself and a LAN reachable CentOS repository,
and in all three cases I found that having one thread resulted in the
best performance. Due to this finding, I set the default number of
threads to 1. The --num-threads flag can be used to override the

One thing I did not simulate in my testing was network latency. If there
were high network latency, I would guess that adding more threads might
eventually lead to better performance, as more of them would be in a
waiting state instead of thrashing each other. I didn't much time to
simulate this scenario, so if you find that adding threads improves
performance, it might be due to latency. I'd like to know if that does
help, as it would warrant another test.


in my first test I mention in the initial post there was no real
network latency since it was all over my local gige network.

I repeated the above test with the addition of 4 threads and it went
from ~3m -> 2m20s

so for very low latency syncs:

Pulp V1                       : 1m18s
Pulp V2 with 4 threads        : 2m20s
Pulp V2 default with 1 thread : 3m12s

repeating my test with a larger network latency between the pulp
server and the remote repo (roughly 500K/sec download speed, 100ms
ping) and the difference actually gets much wider:

high latency sync:

Pulp V1                       :    5m10s
Pulp V2 with 4 threads        : 7m27s
Pulp V2 default with 1 thread : 25m18s

so, by default it is *really* bad, with a bit of tuning it gets much
better but is still slower than Pulp V1

To give an example of a larger repo with latency, the RHEL 6Server x86_64 repo:

Pulp V1 (i believe 4 threads):  108 minutes
Pulp V2 with 4 threads: 192 minutes
Pulp V2 with 1 thread:  300 minutes


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]