MirrorManager crawler patch

Ricky Zhou ricky at fedoraproject.org
Mon Jul 20 05:27:11 UTC 2009


On 2009-07-20 12:28:34 AM, Ricky Zhou wrote:
> I just took a closer look at this with Matt, and it turns out that my
> extra code in this patch shouldn't be necessary (and in fact, doesn't
> seem to run at all).  I'm going to look at testing this more on another
> outdated site.
Hi, Matt and I just spoke on IRC more, and I think we have a slightly
better idea of the issue now.  I think mirrormanager returning outdated
mirrors might have actually been related to the mounts issue as well.

One issue that we realized was that for F10 updates, yum uses a URL
similar to

http://mirrors.fedoraproject.org/mirrorlist?repo=updates-released-f$releasever&arch=$basearch

to generate the directory to get repodata from.  However, this returns
the path to pub/fedora.redhat/linux/updates/10/x86_64 on mirrors, and
while that may be up to date (since mirrormanager only checks the 10
newest files in that directory, and the recent timestamp issues may have
made this test unreliable), the
pub/fedora.redhat/linux/updates/10/x86_64/repodata may not be.

In the case of the bu mirror, we found that
pub/fedora.redhat/linux/updates/10/x86_64/repodata was properly marked
outdated, but pub/fedora.redhat/linux/updates/10/x86_64 was not.

Another issue that Matt mentioned is that report_mirror will tell
mirrormanager to mark any directory that the site claims to have as
up2date, the idea being that mirrors run rsync && report_mirror.  This
does seem to be cause issues during mass mirror issues like this though,
and Matt also brought up the issue that some mirrors may run
report_mirror even if the rsync fails.  

Some issues/responses we discussed:
1) For the first issue, we need to mark an entire repository outdated if
   the repodata is outdated.  This should start happening properly as
   well now that the timestamps issue is fixed, although we can do this
   explicitly in the code as well.
2) MirrorManager currently doesn't check timestamps, and the solution to
   this isn't trivial, especially since with FTP, which returns
   directory listing data as just the text of the output.  This is
   almost impossible to parse accurately, especially when time zones are
   involved, and when time zone data isn't even returned by FTP.
3) Perhaps it could be good to change some behavior with report_mirror.
   Right now, when public mirrors run it, it gives the benefit of
   starting to send traffic to the mirror as soon as possible after
   syncing, but in situations like the current one, this behavior can
   lead to outdated mirrors being marked up2date in MirrorManager.

Thanks,
Ricky
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: <http://listman.redhat.com/archives/fedora-infrastructure-list/attachments/20090720/c23e032f/attachment.sig>


More information about the Fedora-infrastructure-list mailing list