[Pulp-list] Changing working_directory and/or reducing disk utilization during sync

Michael Hrivnak mhrivnak at redhat.com
Fri Mar 31 21:54:42 UTC 2017


I just looked at the repo, and other.xml is 720MB compressed!!! Wow! I
wonder what's in there.

For comparison, just for fun, I checked RHEL 6.8. The other.xml file there
is under 5MB compressed.

The setting to change where the working directory lives is intended to help
in a scenario where you're using a slow/latent network filesystem in a Pulp
cluster. It allows a worker process to potentially use fast local storage
for transient data. But you do pay a small price for having to eventually
copy some data from that filesystem to the shared one.

Thus on a single-machine deployment, it pays to have /var/cache/pulp on the
same filesystem as /var/lib/pulp.

When changing the setting, restarting services is all you need to do,
besides of course ensure that the "apache" user can write to the new
location.

Otherwise, there's no option for reducing Pulp's disk usage during sync. It
has to download that 720MB file, and it does end up storing all of that
data on disk uncompressed temporarily while the sync takes place. I theory
we could modify that workflow to store those temporary data blobs (one for
each RPM) compressed in the working directory. But it's not currently
optimized for gigantic metadata files, and I'm not sure if it would be
worth adding that complexity and overhead for a rare use case.

Michael

On Wed, Mar 29, 2017 at 9:57 AM, Christina Plummer <cplummer at gmail.com>
wrote:

>
> We found the "working_directory" setting in server.conf, but couldn't find
> much documentation about it. Since this is a production system, I wanted to
> check with the list first to confirm:
> 1) Will changing this to a location on a different, larger filesystem
> address my issues with /var utilization spikes during repo sync?
> 2) Are there any special considerations to changing this setting, other
> than restarting all the services?  Do I need to copy the subdirectories? Is
> a symlink a bad idea? It looks like the SELinux context probably needs to
> be set to pulp_var_cache_t.
> 3) Is there another way to reduce Pulp's utilization during the sync? This
> repo seems to be particularly egregious in terms of the massive size of the
> uncompressed other.db and filelists.db for some reason.
>
> Thanks,
> Christina
>
> _______________________________________________
> Pulp-list mailing list
> Pulp-list at redhat.com
> https://www.redhat.com/mailman/listinfo/pulp-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://listman.redhat.com/archives/pulp-list/attachments/20170331/be1fc1af/attachment.htm>


More information about the Pulp-list mailing list