mirrormanager future features

Mon Sep 3 23:05:52 UTC 2007

MirrorManager, for what I really wanted to see by the Fedora 7
release, has been a success.  But there are still several gotchas I'd
like to iron out before Fedora 8.

* The mirrorlist mod_python applet consumes too much memory on the app
  servers.  It basically reads in a 2MB mirrorlist_cache pickle file
  which is lists, by directory, of what mirrors hold what content.
  Handy to have, but in mod_python, that blows the RSS size out to
  ~27MB per process, times all the httpd processes that have run that
  code, each with their own private copy.  Not pretty.

  The mirrormanager TurboGears backend isn't fast enough to handle all
  the client requests for mirrorlists, hence I exported the data for
  mod_python to use.  But the mod_python trick takes too much memory.

  The way out?  Split the mod_python applet into two pieces:

  1) Yet another daemon, listening on a local UNIX socket,
  that has a copy of the mirrorlist cache.  It calculates the answers
  to return.

  2) The mod_python applet connects to the daemon, passes it's list
  of args, and gets back the answer list.  It handles redirects too.

  In this way, the daemon can fork() itself if necessary to handle the
  traffic, but those forks() use copy-on-write memory, and the
  children will never touch the pickle, so they'll all share mostly
  the same memory.  One copy of the mirrorlist_cache, used by all
  children.

  Since I'm saving so much RSS memory here, I can add back into the
  mirrorlist_cache all the directories which are being omitted
  now. So, we will be able to return the list for any dir or file that
  the public mirrors know about, not just a few as we do now.

  I've got a stab at this, but am still working on the details.  I'll
  want to do some time tests against the new code, to make sure it
  isn't too much slower for clients, but a quick swag shows it'll be
  OK; 0.3sec or so per request, even in parallel, which IMHO is "good
  enough".

* Mike's redirection stuff is included in the above already, so
  that'll be online as soon as the rest is.

Now, to find the time before F8...

Still to come, provided I find a lot of time (unlikely), or someone
else steps up to help:

* Designate a way for mirrors to claim themselves to be always
  up-to-date.  Probably will require a sysadmin to set this bit, as
  it's somewhat dangerous.  But there are cases, e.g. a local
  out-of-line squid proxy, where it makes sense to do it.  This change
  will change the schema, and has repercussions throughout the code,
  so I haven't wanted to make it lightly.

* Some people want metalink support.  Conceptually it's possible, and
  even pretty easy once we've got the daemon above working right.  But
  as noted on f-d-l, it's been 10 weeks since someone asked for it and
  even sent some code that doesn't quite integrate but was a starting
  point, and I haven't had time to get to it.  It's not looking good
  for me to add that right now, but I'd be happy to review patches.

* I've wanted to add the libgeoip country->continent mappings, so we
  can fall back netblock -> country -> continent -> global but I
  don't know C->Python bindings code at all, and need that exported in
  python-GeoIP for mm to use.

* I've got pending a request to change the fedora.repo files to make
  yum treat the list as in priority order.  I really want the
  continent mappings in place before doing that though...

  Should we let countries with <3 mirrors return their own lists?  Right
  now if a country has <3 mirrors, the users get the global list back.

Anything else people really need to see?

Thanks,
Matt

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux