[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]

InstantMirror Proposal Re: ApacheMirror.py for a site-local Fedora mirror



Ed Swierk wrote:
Having tired of babysitting the rsync cron job that was keeping my
local Fedora mirror up-to-date, I tried the caching proxy approach
suggested at http://fedoraproject.org/wiki/Infrastructure/Mirroring/SiteLocalMirrors
for a few weeks. This, too, was unsatisfactory--I still want some
control over the mirrored content and the ability to pre-populate the
cache from a DVD ISO acquired via bittorrent when a new version of
Fedora is released.

ApacheMirror.py is a mod_python request handler that behaves like a
caching proxy, except it maps the URL path of a cached document
directly to a local directory rather than hashing the URL, this
preserving the mirror directory structure.

Just drop ApacheMirror.py into /usr/lib/python*/site-packages, set
your preferred upstream server and point it at a local directory on a
nice big disk, and forget it:

<VirtualHost *:80>
   ServerName mirrors.sample.com
   ServerName mirrors
   DocumentRoot /mirrors

   SetHandler mod_python
   PythonHandler ApacheMirror
   PythonDebug on
   PythonOption ApacheMirror.upstream http://download.fedora.redhat.com
</VirtualHost>

The implementation is by no means bulletproof--consider this release
0.1--but it's worked well enough to serve local yum needs for the past
few days.

If there's interest, I could package up the script into an srpm (which
seems overkill for 50 lines of Python) or submit it as a patch to some
existing package.

--Ed


Excellent, I was hoping for something like this! I had played a bit with both squid and varnish, but neither were fully satisfactory because they can't easily store your cache in the original directory structure without writing your own backend storage engine.

http://fedoraproject.org/wiki/Infrastructure/ProjectHosting/RequestingNewProject
Could you please create an "upstream" project for it at hosted.fedoraproject.org? I think there are a number of improvements that can be made.

I didn't read deeply into your code yet, but I imagine that it needs improvement to handle unique synchronization and expiration issues that yum repos and rawhide install trees create when file contents change without changing filenames.

Perhaps a separate, asynchronous daemon can monitor upstream (via HTTP or whatever) for repomd.xml changes. It should then parse the repomd.xml so it knows when to expire the repodata/* files. Then it should parse the .xml files in repodata/ to compare it to local storage, and intelligently expire the packages if any changed (as happens during signing). It can then know exactly which files to delete from the local cache because they are no longer in the upstream. This daemon interacts with ApacheMirror.py only in deleting files from the local directories, effectively expiring the cache. Very simple.

That daemon could be configured to handle intelligent expiry of various parts of the mirror tree in different ways. For example: - development (rawhide) repo changes at least once per day. It also contains install images (boot.iso, bootdisk.img, stage2, etc.) that need to be expired every time the tree changes. (We might need to add a hashes file to the mirror tree to allow the tool to monitor these changes.) - Released distros never change, so don't need to monitor their repomd.xml for changes.

Please create an upstream project at hosted.fedoraproject.org and let's get started on this! Here you get to choose an project name for your new "upstream" project. I personally would choose something like really obvious like InstantMirror... but you get to choose.

The default definitions for mirroring download.fedoraproject.org could be included in a Fedora/EPEL package that requires ApacheMirror.py and the monitor/expiry daemon. That way a sysadmin who wants to create an instant Fedora mirror need only install that package and enable it in /etc/httpd/conf.d/. yum update handles pulling in updates for tree changes (repo locations, how often to poll for repomd.xml changes, etc.)

Example:
yum install InstantMirror-fedora
vim /etc/httpd/conf.d/InstantMirror-fedora.conf
#(enable stuff)
service httpd restart
# http://fedora.localdomain.com
Instant Fedora mirror!

InstantMirror-fedora.noarch.rpm    : instant Fedora mirror
InstantMirror-centos.noarch.rpm    : instant CentOS mirror
InstantMirror-rpmfusion.noarch.rpm : instant RPMFusion mirror
InstantMirror-foo.noarch.rpm       : instant Foo mirror

Warren Togami
wtogami redhat com

p.s.
The same code could be used to create a public static-repos mirror. static-repos changes many times per hour, probe for changes every 2 minutes. We need a few permanent public mirrors of this so people stop hitting the koji server directly. Any public mirror interested in hosting this?

p.p.s.
Another idea before I forget about it:
Later add configurable fallbacks to a different upstream if download.fp.org is down. mirrors.kernel.org might be a good alternative for default, for example.


[Date Prev][Date Next]   [Thread Prev][Thread Next]   [Thread Index] [Date Index] [Author Index]