[Pulp-list] Children resource usage

Fri Jun 26 18:35:28 UTC 2015

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Hi Sean,

I don't specifically which version it will land in, but at the
earliest it would be 2.8.0. You can follow the progress by watching
these two issues [0] [1].

[0]: https://pulp.plan.io/issues/1060
[1]: https://pulp.plan.io/issues/898

- -Brian

On 06/26/2015 10:07 AM, Sean Waite wrote:
> Thanks Brian and Salvatore. We're using it in a much similar 
> fashion, with pulp just acting as an https target for yum and 
> managing everything on the backend.  I'll read the clustering 
> guide, it could be useful. Any idea what release the HA celerbeat 
> and resource_manager are targeted for?
> 
> Sean
> 
> On Wed, Jun 24, 2015 at 12:51 PM, Brian Bouterse 
> <bbouters at redhat.com <mailto:bbouters at redhat.com>> wrote:
> 
> Salvatore,
> 
> Thanks for the note describing the setup you've done. It's great
> to see users clustering Pulp!
> 
> I've done work with clustering Pulp (starting with 2.6.1) and put 
> together a clustering guide [0] which was tested by QE (and me).
> 
> Pulp still has two single points of failure (pulp_celerybeat, and 
> pulp_resource_manager) but we're working on fixing those in a 
> future version of Pulp.
> 
> Even after fixing those issues Pulp will still have trouble 
> guaranteeing consistency when using a replica_set with mongodb. 
> That is going to be harder to fix and we're still in the planning 
> phase. You can follow that issue, discussion, and its subissues 
> here [1]. That being said it should *mostly* work today, but your 
> mileage may vary .
> 
> Generally, the clustering doc [0] is the preferred way to scale 
> Pulp within a single data center or over low latency network 
> connections. You didn't use nodes, but to clarify for others the 
> nodes feature is more for replicating content data between data 
> centers or if one of the Pulp installations needs to be network 
> disconnected from the other.
> 
> [0]: 
> http://pulp.readthedocs.org/en/latest/user-guide/scaling.html#clusteri
ng
>
>
>
> 
- -pulp
> [1]: https://pulp.plan.io/issues/1014
> 
> -Brian
> 
> On 06/23/2015 04:08 AM, Salvatore Di Nardo wrote:
>> Not an expert here as i started to work with pulp just recently, 
>> but i tried to install 2-3 server configuration with a master and
>> 1 or 2 clients. The aim was to spread the load (in active active
>> configuration so no clustered configuration) and avoiding a
>> single point of failure.
> 
>> Sadly i went stuck with the fact that nodes need Oauth 
>> authentication, but it was not working properly and other pages 
>> declared oauth deprecated and soon to be removed from pulp.
> 
>> How then nodes should work its a mystery. Since the
>> documentation was contradicting itself and i didn't managed to
>> make of work ( ssl issues even if i disabled it everywhere) i
>> opted for a totally different approach:
> 
>> I created a single pulp server and mounted a nas volume.
> 
>> I moved the /var/lib/pulp and /var/lib/mongodb to the nas and 
>> replaced the mentioned path with another nfs mount. Simbolic 
>> links could work with mongodb, but not with pulp as some paths 
>> need to be available on apache who by default don't follow 
>> simlinks.
> 
>> Once the pulp stuff are located in NAS i exported that volume on
>>  2 more apache servers and made available the same  'published' 
>> directory trough those apache server ( you can reuse the 
>> pukp.conf in /etc/httpd/conf.d as it need just minor changes). 
>> All the clients actually connect to the apache servers, so i can
>>  scale horizontally how much do i want and the pulp server only 
>> do the repo sync so his load actually its quite low.
> 
>> The good: with this configuration the pulp server can be 
>> restarted, reinstalled, or shutdown and the repos will still 
>> available to the hosts as they connect to the apache servers. 
>> This helps pulp maintenance. Having pulp unavailable means only 
>> that there will be no new syncs to update the repositories but 
>> the repos are available.
> 
>> The bad: this is all nice but only if you use pulp as pure rpm 
>> repo manager. If you use pulp also to register the hosts, then 
>> this configuration its no use for you. since the hosts have to 
>> register, they have to connect to the pulp servers and only pulp
>>  can 'push' changes to the hosts, so the single point of failure
>>  comes back.
> 
>> the workaround ( no, its not "ugly" :) ) In my work environment 
>> we use puppet to define the server configuration and the running 
>> services, so we can rebuild it automatically without manual 
>> intervention. This includes repo configurationa nd packages 
>> installed, so we dont need to register hosts in specific host 
>> groups as puppet does everything (better).
> 
>> Actually during my host registration test i didn't liked the 
>> logic behind. We host several thousand hosts and we need to be 
>> able to reinstall them when needed without manual intervention. 
>> Puppet cope that, so when i was looking how to register a host i
>>  was surprised that a host cannot register to a specific puppet 
>> host group. You have to do that by hand on the puppet server ( 
>> more exactly: using pulp-admin). So anytime a machine register 
>> itself you have some manual task on pulp, which its not scalable
>>  for us, so in the end we skipped this part and used pulp just 
>> are local rpm repo and continued to use puppet for the rest.
> 
> 
>> On 22/06/15 15:11, Sean Waite wrote:
>>> By children, I'm referring to child nodes - the subservers
>>> that can sync from a "parent" node.
>>> 
>>> Looking again at the resources, below is what I have. It does 
>>> look like the 1.7g proc is actually a worker.
>>> 
>>> Some statistics on what I have here (resident memory): 2 
>>> celery__main__worker procs listed as "resource_manager"  - 41m 
>>> memory each 2 celery__main__worker procs listed as 
>>> "reserved_resource_worker" - 42m and 1.7g respectively 1 mongo 
>>> process - 972m 1 celerybeat - 24m a pile of httpd procs - 14m 
>>> each 1 qpid -  21m
>>> 
>>> For disk utilization, the mongo db is around 3.8G and my 
>>> directory containing all of the rpms etc is around 95G.
>>> 
>>> We're on a system with only 3.5G available memory, which is 
>>> probably part of the problem. We're looking at expanding it, 
>>> I'm just trying to figure out how much to expand it by. From 
>>> your numbers above, we'd need 6-7G of memory + 2*N gigs for
>>> the workers. Should I expect maybe 3-4 workers at any one time?
>>>  I've got 2 now, but that is at an idle state.
>>> 
>>> 
>>> On Mon, Jun 22, 2015 at 9:24 AM, Brian Bouterse 
>>> <bbouters at redhat.com <mailto:bbouters at redhat.com>
> <mailto:bbouters at redhat.com <mailto:bbouters at redhat.com>>> wrote:
>>> 
>> Hi Sean,
> 
>> I'm not really sure what you mean by the term 'children'. Maybe 
>> you mean process or consumer?
> 
>> I expect pulp_resource_manager to use less than 1.7G of memory, 
>> but its possible. Memory analysis can be a little bit tricky so 
>> more details are needed about how this is being measured to be 
>> sure.
> 
>> The biggest memory process within Pulp by far is mongodb. If you 
>> can, ensure that at least 4G of RAM is available on that machine 
>> that you are running mongodb on.
> 
>> I looked into the docs and we don't talk much about the memory 
>> requirements. Feel free to file a bug on that if you want. 
>> Roughly I expect the following amounts of RAM to be available per
>> process:
> 
>> pulp_celerybeat, 256MB - 512MB pulp_resource_manager, 256MB - 
>> 512MB pulp_workers. This process spawns N workers. Each worker 
>> could use 256MB - 2GB depending on what its doing. httpd, 1GB 
>> mongodb, 4GB qpidd/rabbitMQ, ???
> 
>> Note all the pulp_*, processes have a parent and child process, 
>> for the numbers above I consider each parent/child together. I 
>> usually show the inheritance using `sudo ps -awfux`.
> 
>> I'm interested to see what others think about these numbers too.
> 
>> -Brian
> 
> 
>> On 06/22/2015 08:46 AM, Sean Waite wrote:
>>> Hi,
> 
>>> I've got a pulp server running, and I'd like to add some 
>>> children. The server itself is a bit hard up on resources, so 
>>> we're going to rebuild with a larger one. How much resources 
>>> would the children use? Is it a fairly beefy process/memory 
>>> hog?
> 
>>> We've got a large number of repositories.
>>> pulp-resource-manager seems to be using 1.7G of memory, with a
>>> .7G of mongodb.
> 
>>> Any pointers on how much I might be able to expect?
> 
>>> Thanks
> 
>>> -- Sean Waite swaite at tracelink.com 
>>> <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>>
>> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>>> Cloud
>> Operations
>>> Engineer                GPG 3071E870 TraceLink, Inc.
> 
>>> Be Excellent to Each Other
> 
> 
>>> _______________________________________________ Pulp-list 
>>> mailing list Pulp-list at redhat.com 
>>> <mailto:Pulp-list at redhat.com>
> <mailto:Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>>
>>> https://www.redhat.com/mailman/listinfo/pulp-list
> 
>>> 
>>> _______________________________________________ Pulp-list 
>>> mailing list Pulp-list at redhat.com 
>>> <mailto:Pulp-list at redhat.com>
> <mailto:Pulp-list at redhat.com <mailto:Pulp-list at redhat.com>>
>>> https://www.redhat.com/mailman/listinfo/pulp-list
>>> 
>>> 
>>> 
>>> 
>>> -- Sean Waite swaite at tracelink.com 
>>> <mailto:swaite at tracelink.com>
> <mailto:swaite at tracelink.com <mailto:swaite at tracelink.com>> Cloud
>>> Operations Engineer                GPG 3071E870 TraceLink, 
>>> Inc.
>>> 
>>> Be Excellent to Each Other
>>> 
>>> 
>>> _______________________________________________ Pulp-list 
>>> mailing list Pulp-list at redhat.com
>>> <mailto:Pulp-list at redhat.com> 
>>> https://www.redhat.com/mailman/listinfo/pulp-list
> 
> 
> 
>> _______________________________________________ Pulp-list
>> mailing list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com> 
>> https://www.redhat.com/mailman/listinfo/pulp-list
> 
> 
> _______________________________________________ Pulp-list mailing 
> list Pulp-list at redhat.com <mailto:Pulp-list at redhat.com> 
> https://www.redhat.com/mailman/listinfo/pulp-list
> 
> 
> 
> 
> -- Sean Waite swaite at tracelink.com <mailto:swaite at tracelink.com> 
> Cloud Operations Engineer                GPG 3071E870 TraceLink, 
> Inc.
> 
> Be Excellent to Each Other
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQEcBAEBCAAGBQJVjZtvAAoJEK48cdELyEfyi80H/ipf1xjUa4aOA9fsk8iCcf46
mPI/V3kJK1r5ksLWjWcONvf6Nv9rTmzbkwL8vOkjtnx+3Lelb2326y8iZYyhqtqL
zcJu5adjpGzAErSi8uCfJ6WVPAtZfPkKDc0cofRoXPXNcgFWmnl3T8p1DGMLOz7Y
a6WGyyp4dC5nZEL0eWzs3z0djlIYGtaw44y27JZdEnTXVnRujHrbEtDOMn5vmpJC
LNE55Uh5fftFaXoQ4BtZRvwQfP36AIif9OprcQKWVM8hJBspC0WBdpE+1Qe5Zb20
HkxZEjbEyfZ9kSAuAseO/JhLqRj4eOJ4H946z7labjQehz80+RilLshVS0hfzZI=
=0wy8
-----END PGP SIGNATURE-----